question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Running spider via CrawlerRunner from script gives Error: ReactorNotRestartable

See original GitHub issue

Ok so I’ve given it a lot of try, asked most pythonistas I know, even on stackoverflow here: https://stackoverflow.com/questions/46346863/twisted-reactor-not-restarting-in-scrapy

But I cannot for the life of me figure this out. I’m trying to execute the spider via a telegram bot, using the python-telegram-bot API wrapper.

I’m running Python 2.7 on Windows 10.

This is my code:

from twisted.internet import reactor
from scrapy import cmdline
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters, RegexHandler
import logging
import os
import ConfigParser
import json
import textwrap
from MIS.spiders.moodle_spider import MySpider
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner, CrawlerProcess
from scrapy.utils.log import configure_logging


# Read settings from config file
config = ConfigParser.RawConfigParser()
config.read('./spiders/creds.ini')
TOKEN = config.get('BOT', 'TOKEN')
#APP_NAME = config.get('BOT', 'APP_NAME')
#PORT = int(os.environ.get('PORT', '5000'))
updater = Updater(TOKEN)

# Setting Webhook
#updater.start_webhook(listen="0.0.0.0",
#                      port=PORT,
#                      url_path=TOKEN)
#updater.bot.setWebhook(APP_NAME + TOKEN)

logging.basicConfig(format='%(asctime)s -# %(name)s - %(levelname)s - %(message)s',level=logging.INFO)

dispatcher = updater.dispatcher

def doesntRun(bot, update):
    configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
    runner = CrawlerRunner({
        'FEED_FORMAT' : 'json',
        'FEED_URI' : 'output.json'
        })

    d = runner.crawl(MySpider)
    d.addBoth(lambda _: reactor.stop()) # this line is supposed to restart the reactor, right?
    reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished

    with open("./output.json", 'r') as file:
        contents = file.read()
        a_r = json.loads(contents)
        AM = a_r[0]['AM']
        ...
        ...

        message_template = textwrap.dedent("""
                AM: {AM}
                ...
                """)
        messageContent = message_template.format(AM=AM, ...)
        bot.sendMessage(chat_id=update.message.chat_id, text=messageContent)


# Handlers
test_handler = CommandHandler('doesntRun', doesntRun)

# Dispatchers
dispatcher.add_handler(test_handler)

updater.start_polling()
updater.idle()

Please provide insight on how I can restart the reactor, this is bugging me from a couple of days.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
ArionMilescommented, Sep 24, 2017

Okay, I finally solved my problem.

the Python-telegram-bot API wrapper offers an easy way to restart the bot.

I simply put the lines:

time.sleep(0.2)
os.execl(sys.executable, sys.executable, *sys.argv)

at the end of the doesntRun() function. Now whenever I call the function via bot, it scrapes the page, stores the results, forwards the result, then restarts itself. Doing so allows me to execute the spider any number of times I want.

3reactions
ArionMilescommented, Nov 30, 2019

@WalkOnMars I later found out this method is not recommended. There’s a better way to do this, which is to spawn a separate process which executes the spider when your bot function is executed.

This SO answer has a code example with it: https://stackoverflow.com/a/43661172/5129096

And this is how I’ve used the above answer in my own code: https://github.com/ArionMiles/MIS-Bot/blob/master/mis_bot/scraper/spiders/attendance_spider.py#L104

Read more comments on GitHub >

github_iconTop Results From Across the Web

ReactorNotRestartable error in while loop with scrapy
Then you can simply run this using different subprocesses. ... def run_spider(number): crawler = CrawlerRunner() crawler.crawl(MySpider ...
Read more >
Run Scrapy Spiders from Python Script - YouTube
Learn how to call Scrapy spider from main.py, a question that I get often. You will learn how to run Scrapy multiple spiders...
Read more >
Scrapy From one Script: ProcessCrawler - YouTube
In this video I'll show you how to use the Scraper ProcessCrawler to run a scrapy spider without using scrapy crawl command.
Read more >
Core API — Scrapy 2.7.1 documentation
See Run Scrapy from a script for an example. ... this function finds a spider with this name in a Scrapy project (using...
Read more >
Scrapy: Fail To Re-Run In Jupyter Notebook Script ...
Scrapy - Reactor not Restartable [duplicate]. ... Running spider via CrawlerRunner from script gives Error: ReactorNotRestartable #2941. Closed.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found