question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cannot yield item in forsubfunction of parse process

See original GitHub issue

Here I will give a example:

class HiapkSpider(scrapy.spider.Spider):
    name="hiapk"
    allowed_domains=['apk.hiapk.com']

    def __init__(self,start=1,end=2):
        self.prefix_url="http://apk.hiapk.com/apps?sort=5&pi="
        self.base_url="http://apk.hiapk.com"
        self.prefix_path="../temp/"
        self.page_start=int(start)
        self.page_index=self.page_start
        self.page_end=int(end)
        HiapkSpider.start_urls=[self.prefix_url+str(self.page_start)]

    def parse(self,response):
        self.apkList=[]
        if self.page_index > self.page_end:
            print 'Spider Completed'
            return
        for sel in response.xpath('//li[contains(@class,"list_item")]'):
            item=ApkItem()
            item['name']=sel.xpath('div/dl/dt/span/a/@href').extract()[0].split('.')[-1]
            item['version']=sel.xpath('div/dl/dt/*[2]/text()').extract()[0][1:-1]
            item['url']=self.base_url+sel.xpath('div/*[3]/a/@href').extract()[0]
            item['path']=self.prefix_path+item['name']+'.apk'
            self.apkList.append(item)
        self.download()
        self.page_index+=1
        for item in self.apkList:
            yield item
        yield Request(self.prefix_url+str(self.page_index),
                      callback=self.parse)


    def download(self):
        print 'DownLoad Start'
        p=Pool()
        result=[]
        for item in self.apkList:
           # tmp=p.apply_async(urllib.urlretrieve,[item['url'],item['path']])
           tmp=p.apply_async(kkk,[2])
           result.append(tmp)
        p.close()
        p.join()
        for i in range(0,len(self.apkList)):
            try:
                print result[i].get()
            except:
                continue
            else:
                yield self.apkList[i]

if I use yield in the for stucture of download function , this function will not be excuted. But if I yield directly in download function, all right. Very strange!

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
curitacommented, Apr 3, 2015

self.download() returns an iterator, you have to iterate over its results to get them. Replacing that line with:

for item in self.download():
    yield item

should do the work.

I just skimmed through that code but this seems like a support question, not a bug, I encourage you to use our scrapy-users mailing list for those.

0reactions
kmikecommented, Apr 14, 2015

It doesn’t look like an issue with Scrapy. Feel free to reopen if you still think there is some problem.

As a side note, you can get async downloads without using a thread pool (see http://doc.scrapy.org/en/master/topics/request-response.html#passing-additional-data-to-callback-functions). There are tickets to make it easier: #1144, #1138.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy can't return item when using yield? - Stack Overflow
I'll try to abstract my code because it's a bit big. So this function is used to parse a forum thread def parse_thread_next_pages(self,response): ......
Read more >
Process All Yield Items In Scrapy - ADocLib
Currently I have a Scrapy Spider yielding anycodingsscrapy various items on the parse method.Is there anycodingsscrapy any way to get all. Scrapy supports...
Read more >
User's guide for the Yield Estimation Subsystem Data Management ...
i n the process of crer4tirg and maintaining the data base. ... since they are internally parsed in order as they appear on....
Read more >
Scrapy cant return item when using yield - Anycodings.com
I'll try to abstract my code because it's a anycodings_web-crawler bit big. So this function is used to parse a forum anycodings_web-crawler ...
Read more >
Print Preview - Creating Web Pages in your Account
You cannot use cell arrays or structures. If you include fewer formatting operators than there are values to insert,. MATLAB reuses the operators...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found