Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors occur when the crawler is closed

See original GitHub issue

Description

The following errors occur when the crawler is closed ‘NoneType’ object has no attribute ‘start_requests’

log：

{'finish_reason': 'response msg error 我们的系统检测到您网络中存在异常访问请求, url '
                  'https://weixin.sogou.com/weixin?type=2&s_from=input&query=mjl_tfsteel&ie=utf8&_sug_=y&_sug_type_=!',
 'finish_time': datetime.datetime(2019, 8, 30, 9, 22, 3, 628463),
 'memusage/max': 679272448,
 'memusage/startup': 679272448,
 'start_time': datetime.datetime(2019, 8, 30, 9, 22, 2, 54762)}
2019-08-30 17:22:03,628 [scrapy.core.engine] INFO: Spider closed (response msg error 我们的系统检测到您网络中存在异常访问请求, url https://weixin.sogou.com/weixin?type=2&s_from=input&query=mjl_tfsteel&ie=utf8&_sug_=y&_sug_type_=!)
2019-08-30 17:22:04,016 [twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.6/site-packages/scrapy/commands/crawl.py", line 58, in run
    self.crawler_process.start()
  File "/home/user/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 293, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/home/user/.local/lib/python3.6/site-packages/twisted/internet/base.py", line 1272, in run
    self.mainLoop()
  File "/home/user/.local/lib/python3.6/site-packages/twisted/internet/base.py", line 1281, in mainLoop
    self.runUntilCurrent()
--- <exception caught here> ---
  File "/home/user/.local/lib/python3.6/site-packages/twisted/internet/base.py", line 902, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/home/user/.local/lib/python3.6/site-packages/scrapy/utils/reactor.py", line 41, in __call__
    return self._func(*self._a, **self._kw)
  File "/home/user/.local/lib/python3.6/site-packages/scrapy/core/engine.py", line 137, in _next_request
    if self.spider_is_idle(spider) and slot.close_if_idle:
  File "/home/user/.local/lib/python3.6/site-packages/scrapy/core/engine.py", line 189, in spider_is_idle
    if self.slot.start_requests is not None:
builtins.AttributeError: 'NoneType' object has no attribute 'start_requests'

code

 def start_requests(self):
        for wechat_config in self.wechat_list:
            wechat_name = wechat_config.name
            variety = wechat_config.variety
            search_allow_rule = wechat_config.search_allow_rule
            wechat_id = wechat_config.wechat_id
            wx_id = wechat_config.wx_id

            url = "https://weixin.sogou.com/weixin?type=2&s_from=input&query={}&ie=utf8&_sug_=y&_sug_type_=".format(parse.quote(wechat_id))
            self.headers = {
                    "Host": 'weixin.sogou.com',
                    "Upgrade-Insecure-Requests":'1',
                    "Accept":'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
                    "Referer": 'https://weixin.sogou.com/',
                    "Accept-Encoding":'gzip, deflate, br',
                    "Accept-Language":'en-US,en;q=0.9'
                }
            self.headers['User-Agent'] = random.choice(get_project_settings().get('MC_USER_AGENT'))
            response = requests.get(url, headers=self.headers, cookies={}, timeout=300)
            if str(response.content.decode('utf-8')).find(self.error_msg) > 0:
                self.crawler.engine.close_spider(self, 'response msg error {}, url {}!'.format(self.error_msg, url))
                return

Versions

Scrapy (1.6.0)

Issue Analytics

State:
Created 4 years ago
Comments:9 (7 by maintainers)

Top GitHub Comments

2reactions

Gallaeciocommented, Aug 31, 2019

Maybe we can look into improving error handling here, so the root issue is more obvious from the error message.

0reactions

elacuestacommented, Sep 2, 2019

@Luokun2016 Please note that the docs for CloseSpider say it should be raised from a request callback. Also, the exception is not actually being raised, just declared, which has no effect on the method:

>>> from scrapy.exceptions import CloseSpider
>>> CloseSpider('Reason')
CloseSpider()
>>> raise CloseSpider('Reason')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
scrapy.exceptions.CloseSpider