cannot yield item in forsubfunction of parse process
See original GitHub issueHere I will give a example:
class HiapkSpider(scrapy.spider.Spider):
name="hiapk"
allowed_domains=['apk.hiapk.com']
def __init__(self,start=1,end=2):
self.prefix_url="http://apk.hiapk.com/apps?sort=5&pi="
self.base_url="http://apk.hiapk.com"
self.prefix_path="../temp/"
self.page_start=int(start)
self.page_index=self.page_start
self.page_end=int(end)
HiapkSpider.start_urls=[self.prefix_url+str(self.page_start)]
def parse(self,response):
self.apkList=[]
if self.page_index > self.page_end:
print 'Spider Completed'
return
for sel in response.xpath('//li[contains(@class,"list_item")]'):
item=ApkItem()
item['name']=sel.xpath('div/dl/dt/span/a/@href').extract()[0].split('.')[-1]
item['version']=sel.xpath('div/dl/dt/*[2]/text()').extract()[0][1:-1]
item['url']=self.base_url+sel.xpath('div/*[3]/a/@href').extract()[0]
item['path']=self.prefix_path+item['name']+'.apk'
self.apkList.append(item)
self.download()
self.page_index+=1
for item in self.apkList:
yield item
yield Request(self.prefix_url+str(self.page_index),
callback=self.parse)
def download(self):
print 'DownLoad Start'
p=Pool()
result=[]
for item in self.apkList:
# tmp=p.apply_async(urllib.urlretrieve,[item['url'],item['path']])
tmp=p.apply_async(kkk,[2])
result.append(tmp)
p.close()
p.join()
for i in range(0,len(self.apkList)):
try:
print result[i].get()
except:
continue
else:
yield self.apkList[i]
if I use yield in the for stucture of download function , this function will not be excuted. But if I yield directly in download function, all right. Very strange!
Issue Analytics
- State:
- Created 8 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Scrapy can't return item when using yield? - Stack Overflow
I'll try to abstract my code because it's a bit big. So this function is used to parse a forum thread def parse_thread_next_pages(self,response): ......
Read more >Process All Yield Items In Scrapy - ADocLib
Currently I have a Scrapy Spider yielding anycodingsscrapy various items on the parse method.Is there anycodingsscrapy any way to get all. Scrapy supports...
Read more >User's guide for the Yield Estimation Subsystem Data Management ...
i n the process of crer4tirg and maintaining the data base. ... since they are internally parsed in order as they appear on....
Read more >Scrapy cant return item when using yield - Anycodings.com
I'll try to abstract my code because it's a anycodings_web-crawler bit big. So this function is used to parse a forum anycodings_web-crawler ...
Read more >Print Preview - Creating Web Pages in your Account
You cannot use cell arrays or structures. If you include fewer formatting operators than there are values to insert,. MATLAB reuses the operators...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
self.download()
returns an iterator, you have to iterate over its results to get them. Replacing that line with:should do the work.
I just skimmed through that code but this seems like a support question, not a bug, I encourage you to use our scrapy-users mailing list for those.
It doesn’t look like an issue with Scrapy. Feel free to reopen if you still think there is some problem.
As a side note, you can get async downloads without using a thread pool (see http://doc.scrapy.org/en/master/topics/request-response.html#passing-additional-data-to-callback-functions). There are tickets to make it easier: #1144, #1138.