question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot deploy spiders when importing `urlparse`

See original GitHub issue

Environment:

  1. macOS Sierra 10.12.3 (16D32)
  2. Python 3.6 [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin] installed via brew
  3. Scrapy 1.3.2
  4. shub 2.5.1

Steps:

mkdir shubissue
cd shubissue
python3 -m venv .pyenv
source .pyenv/bin/activate
pip install scrapy shub
scrapy startproject myscrapy

cd myscrapy
scrapy genspider example example.com

shub deploy
# provide project ID
# set as default

Message

{"status": "ok", "spiders": 1, "project": XXXXXX, "version": "1.0"}

Change the contents of myscrapy/myscrapy/spiders/example.py to:

import scrapy
from urllib.parse import urlparse


class ExampleSpider(scrapy.Spider):
     name = "example"
     allowed_domains = ["example.com"]
     start_urls = ['http://example.com/']

     def parse(self, response):
         pass

Rerun:

shub deploy

Message:

{"status": "ok", "spiders": 0, "project": XXXXXX, "version": "1.0"}

If you create new spiders they will be ignored also.

I will try to reproduce on a linux environment.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jdemaeyercommented, Feb 20, 2017

Hey @rtodea, thanks for the issue!

For compatibility reasons Scrapy Cloud uses Python 2 by default, where the urllib module still had a different structure and your import fails with an ImportError. You can switch to Python 3 by specifying a corresponding stack in your scrapinghub.yml, e.g. like this:

projects:
  default:
    id: XXX_YOUR_PROJECT_ID
    stack: scrapy:1.3-py3

What’s curious is that your deploy didn’t fail with a build error but apparently the build went through just fine but dropped the spiders. This is what following your steps produced on my machine:

jakob@MosEisley ~/playground/shubissue/myscrapy % shub deploy
Packing version 1.0
Deploying to Scrapy Cloud project "43100"
Deploy log last 30 lines:
    sys.exit(list_spiders())
  File "/usr/local/lib/python2.7/dist-packages/sh_scrapy/crawl.py", line 170, in list_spiders
    _run_usercode(None, ['scrapy', 'list'], _get_apisettings)
  File "/usr/local/lib/python2.7/dist-packages/sh_scrapy/crawl.py", line 127, in _run_usercode
    _run(args, settings)
  File "/usr/local/lib/python2.7/dist-packages/sh_scrapy/crawl.py", line 87, in _run
    _run_scrapy(args, settings)
  File "/usr/local/lib/python2.7/dist-packages/sh_scrapy/crawl.py", line 95, in _run_scrapy
    execute(settings=settings)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 142, in execute
    cmd.crawler_process = CrawlerProcess(settings)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 209, in __init__
    super(CrawlerProcess, self).__init__(settings)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 115, in __init__
    self.spider_loader = _get_spider_loader(settings)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 296, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 30, in from_settings
    return cls(settings)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 21, in __init__
    for module in walk_modules(name):
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 71, in walk_modules
    submod = import_module(fullpath)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/app/__main__.egg/myscrapy/spiders/example.py", line 3, in <module>
ImportError: No module named parse
{"message": "List exit code: 193", "details": null, "error": "build_error"}

{"message": "Internal build error", "status": "error"}
Deploy log location: /tmp/shub_deploy_dwkz8229.log
Error: Deploy failed: b'{"message": "Internal build error", "status": "error"}'
1reaction
redapplecommented, Mar 27, 2017

Thanks for the heads up @rubhanazeem

Read more comments on GitHub >

github_iconTop Results From Across the Web

Import error urllib.parse in scrapy - python - Stack Overflow
I know scrapy needs python 2.7 but urllib.parse is introduced in python 3, before that it was urlparse. Looking at the error it...
Read more >
A little help for a new scrapy user? - Google Groups
I can't get it to crawl anything beyond the index page. What am I doing wrong? ... from scrapy.contrib.spiders import CrawlSpider, Rule ......
Read more >
Requests and Responses — Scrapy 2.7.1 documentation
Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the ...
Read more >
Add parser to requirements.txt - Support - Zyte
Now I get the following error while trying to deploy the project to Scrapinghub: ... but... what should I put on my requirements.txt...
Read more >
How to Fix Error: No Module Named 'urlparse' (Easily) - Finxter
This error usually occurs because urlparse has been renamed to urllib.parse . So you need to pip install urllib and then import urllib...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found