Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Running spider via CrawlerRunner from script gives Error: ReactorNotRestartable

See original GitHub issue

Ok so I’ve given it a lot of try, asked most pythonistas I know, even on stackoverflow here: https://stackoverflow.com/questions/46346863/twisted-reactor-not-restarting-in-scrapy

But I cannot for the life of me figure this out. I’m trying to execute the spider via a telegram bot, using the python-telegram-bot API wrapper.

I’m running Python 2.7 on Windows 10.

This is my code:

from twisted.internet import reactor
from scrapy import cmdline
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters, RegexHandler
import logging
import os
import ConfigParser
import json
import textwrap
from MIS.spiders.moodle_spider import MySpider
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner, CrawlerProcess
from scrapy.utils.log import configure_logging


# Read settings from config file
config = ConfigParser.RawConfigParser()
config.read('./spiders/creds.ini')
TOKEN = config.get('BOT', 'TOKEN')
#APP_NAME = config.get('BOT', 'APP_NAME')
#PORT = int(os.environ.get('PORT', '5000'))
updater = Updater(TOKEN)

# Setting Webhook
#updater.start_webhook(listen="0.0.0.0",
#                      port=PORT,
#                      url_path=TOKEN)
#updater.bot.setWebhook(APP_NAME + TOKEN)

logging.basicConfig(format='%(asctime)s -# %(name)s - %(levelname)s - %(message)s',level=logging.INFO)

dispatcher = updater.dispatcher

def doesntRun(bot, update):
    configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
    runner = CrawlerRunner({
        'FEED_FORMAT' : 'json',
        'FEED_URI' : 'output.json'
        })

    d = runner.crawl(MySpider)
    d.addBoth(lambda _: reactor.stop()) # this line is supposed to restart the reactor, right?
    reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished

    with open("./output.json", 'r') as file:
        contents = file.read()
        a_r = json.loads(contents)
        AM = a_r[0]['AM']
        ...
        ...

        message_template = textwrap.dedent("""
                AM: {AM}
                ...
                """)
        messageContent = message_template.format(AM=AM, ...)
        bot.sendMessage(chat_id=update.message.chat_id, text=messageContent)


# Handlers
test_handler = CommandHandler('doesntRun', doesntRun)

# Dispatchers
dispatcher.add_handler(test_handler)

updater.start_polling()
updater.idle()

Please provide insight on how I can restart the reactor, this is bugging me from a couple of days.

Issue Analytics

State:
Created 6 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

4reactions

ArionMilescommented, Sep 24, 2017

Okay, I finally solved my problem.

the Python-telegram-bot API wrapper offers an easy way to restart the bot.

I simply put the lines:

time.sleep(0.2)
os.execl(sys.executable, sys.executable, *sys.argv)

at the end of the doesntRun() function. Now whenever I call the function via bot, it scrapes the page, stores the results, forwards the result, then restarts itself. Doing so allows me to execute the spider any number of times I want.

3reactions

ArionMilescommented, Nov 30, 2019

@WalkOnMars I later found out this method is not recommended. There’s a better way to do this, which is to spawn a separate process which executes the spider when your bot function is executed.

This SO answer has a code example with it: https://stackoverflow.com/a/43661172/5129096

And this is how I’ve used the above answer in my own code: https://github.com/ArionMiles/MIS-Bot/blob/master/mis_bot/scraper/spiders/attendance_spider.py#L104