Running spider via CrawlerRunner from script gives Error: ReactorNotRestartable
See original GitHub issueOk so I’ve given it a lot of try, asked most pythonistas I know, even on stackoverflow here: https://stackoverflow.com/questions/46346863/twisted-reactor-not-restarting-in-scrapy
But I cannot for the life of me figure this out. I’m trying to execute the spider via a telegram bot, using the python-telegram-bot API wrapper.
I’m running Python 2.7 on Windows 10.
This is my code:
from twisted.internet import reactor
from scrapy import cmdline
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters, RegexHandler
import logging
import os
import ConfigParser
import json
import textwrap
from MIS.spiders.moodle_spider import MySpider
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner, CrawlerProcess
from scrapy.utils.log import configure_logging
# Read settings from config file
config = ConfigParser.RawConfigParser()
config.read('./spiders/creds.ini')
TOKEN = config.get('BOT', 'TOKEN')
#APP_NAME = config.get('BOT', 'APP_NAME')
#PORT = int(os.environ.get('PORT', '5000'))
updater = Updater(TOKEN)
# Setting Webhook
#updater.start_webhook(listen="0.0.0.0",
# port=PORT,
# url_path=TOKEN)
#updater.bot.setWebhook(APP_NAME + TOKEN)
logging.basicConfig(format='%(asctime)s -# %(name)s - %(levelname)s - %(message)s',level=logging.INFO)
dispatcher = updater.dispatcher
def doesntRun(bot, update):
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner({
'FEED_FORMAT' : 'json',
'FEED_URI' : 'output.json'
})
d = runner.crawl(MySpider)
d.addBoth(lambda _: reactor.stop()) # this line is supposed to restart the reactor, right?
reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished
with open("./output.json", 'r') as file:
contents = file.read()
a_r = json.loads(contents)
AM = a_r[0]['AM']
...
...
message_template = textwrap.dedent("""
AM: {AM}
...
""")
messageContent = message_template.format(AM=AM, ...)
bot.sendMessage(chat_id=update.message.chat_id, text=messageContent)
# Handlers
test_handler = CommandHandler('doesntRun', doesntRun)
# Dispatchers
dispatcher.add_handler(test_handler)
updater.start_polling()
updater.idle()
Please provide insight on how I can restart the reactor, this is bugging me from a couple of days.
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
ReactorNotRestartable error in while loop with scrapy
Then you can simply run this using different subprocesses. ... def run_spider(number): crawler = CrawlerRunner() crawler.crawl(MySpider ...
Read more >Run Scrapy Spiders from Python Script - YouTube
Learn how to call Scrapy spider from main.py, a question that I get often. You will learn how to run Scrapy multiple spiders...
Read more >Scrapy From one Script: ProcessCrawler - YouTube
In this video I'll show you how to use the Scraper ProcessCrawler to run a scrapy spider without using scrapy crawl command.
Read more >Core API — Scrapy 2.7.1 documentation
See Run Scrapy from a script for an example. ... this function finds a spider with this name in a Scrapy project (using...
Read more >Scrapy: Fail To Re-Run In Jupyter Notebook Script ...
Scrapy - Reactor not Restartable [duplicate]. ... Running spider via CrawlerRunner from script gives Error: ReactorNotRestartable #2941. Closed.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay, I finally solved my problem.
the Python-telegram-bot API wrapper offers an easy way to restart the bot.
I simply put the lines:
at the end of the
doesntRun()
function. Now whenever I call the function via bot, it scrapes the page, stores the results, forwards the result, then restarts itself. Doing so allows me to execute the spider any number of times I want.@WalkOnMars I later found out this method is not recommended. There’s a better way to do this, which is to spawn a separate process which executes the spider when your bot function is executed.
This SO answer has a code example with it: https://stackoverflow.com/a/43661172/5129096
And this is how I’ve used the above answer in my own code: https://github.com/ArionMiles/MIS-Bot/blob/master/mis_bot/scraper/spiders/attendance_spider.py#L104