items-are-not-passing-through-the-pipeline-in-recursive-callbacks-scrapy
See original GitHub issueDescription
The item is not passed through the pipeline for the following code, can someone tell me why?? This is the main crawler file:
import scrapy
from rough.items import RoughItem
class BotSpider(scrapy.Spider):
name = 'bot'
start_urls = ['https://www.iitbbs.ac.in']
def parse(self, response):
yield scrapy.Request('https://www.iitbbs.ac.in', callback=self.help)
def help(self, response):
yield scrapy.Request('https://www.iitbbs.ac.in', callback=self.returnObj)
def returnObj(self, response):
return RoughItem()
Pipeline:
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
class RoughPipeline:
def process_item(self, item, spider):
self.log('Hello!!')
return item
The prompt is not printed. I have enabled the pipeline in settings.py
Steps to Reproduce
Expected behavior: The prompt ‘Hello!!’ should be printed.
Actual behavior: The prompt is not printed which means the item is not passed through the pipeline.
Versions
Scrapy : 2.5.0
lxml : 4.6.3.0
libxml2 : 2.9.5
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.2.0
Python : 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021)
cryptography : 3.4.7
Platform : Windows-10-10.0.19043-SP0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@elacuesta Got it, many thanks!!
@YASH01009 It’s not a bug, this happens because the default dupefilter drops requests to duplicated URLs, and so the callback that finally produces the item (
returnObj
) is never reached. You can setLOG_LEVEL="DEBUG"
(optionally, alsoDUPEFILTER_DEBUG=True
) to see this happening in the crawl logs.