Running scrapy shell against a local file
See original GitHub issueBefore Scrapy 1.0, I could execute:
scrapy shell index.html
In >=1.0, it started to throw ValueError: Missing scheme in request url: index.html
:
$ scrapy shell index.html
2015-10-12 15:32:59 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2015-10-12 15:32:59 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-10-12 15:32:59 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
Traceback (most recent call last):
File "/Users/user/.virtualenvs/so/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/commands/shell.py", line 50, in run
spidercls = spidercls_for_request(spider_loader, Request(url),
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 24, in __init__
self._set_url(url)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 59, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: index.html
As a workaround, I’ve used the “file” protocol providing the full path to a file:
$ scrapy shell file:////absolute/path/to/index.html
From a comment to the relevant SO topic http://stackoverflow.com/questions/33088877/scrapy-shell-against-a-local-file, we can see that the relevant change was introduced here.
Would it be possible and would it make sense to bring the previous behavior back so that we can execute the shell against a local file as easy as scrapy shell filename
?
Thanks!
Issue Analytics
- State:
- Created 8 years ago
- Comments:17 (11 by maintainers)
Top Results From Across the Web
Scrapy shell against a local file - python - Stack Overflow
As per discussion in Running scrapy shell against a local file, the relevant change was introduced by this commit. There was a Pull...
Read more >Scrapy shell — Scrapy 2.7.1 documentation
The Scrapy shell is an interactive shell where you can try and debug your scraping code very quickly, without having to run the...
Read more >Scrapy shell against a local file - DevPress - CSDN
Answer a question Before Scrapy 1.0, I could've run the Scrapy Shell against a local file quite simply: $ scrapy shell index.html After ......
Read more >Web scraping using Python and Scrapy - UCSB Carpentry
How can I setup a scraping project using the Scrapy framework for Python? ... Request and update local objects [s] shelp() Shell help...
Read more >Scrapy framework tips and tricks - Trickster Dev
Running scrapy shell gives you an interactive environment for experimenting with the site being scraped. For example, running fetch() with ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Good to see it works in scrapy shell “./path/to/file/hello.html”
But same url doesn’t work in spider. Anyone can help on that or can confirm this is not supposed to work there?
Thanks everyone for the time! Glad to see it being a part of 1.1.