Default downloader fails to get page
See original GitHub issueβhttp://autos.msn.com/research/userreviews/reviewlist.aspx?ModelID=14749β
Looks like the default downloader implemented with twisted lib canβt fetch the above url. I ran βscrapy shell http://autos.msn.com/research/userreviews/reviewlist.aspx?ModelID=14749β, and got the following output.
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 5, in <module>
pkg_resources.run_script('Scrapy==0.17.0', 'scrapy')
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 489, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 1207, in run_script
execfile(script_filename, namespace, namespace)
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
execute()
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/cmdline.py", line 88, in _run_print_help
func(*a, **kw)
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/commands/shell.py", line 47, in run
shell.start(url=url, spider=spider)
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/shell.py", line 43, in start
self.fetch(url, spider)
File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/shell.py", line 85, in fetch
reactor, self._schedule, request, spider)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/threads.py", line 118, in blockingCallFromThread
result.raiseException()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/python/failure.py", line 370, in raiseException
raise self.type, self.value, self.tb
twisted.internet.error.ConnectionDone: Connection was closed cleanly.
But both urlopen of urllib2 and requests.get can download the page smoothly.
Issue Analytics
- State:
- Created 10 years ago
- Reactions:2
- Comments:15 (6 by maintainers)
Top Results From Across the Web
6 Ways to Fix the βDownload Failed Network Errorβ on Chrome
To get started, press Win + R to open the Run command dialog box. Β· Type inetcpl. Β· Navigate to the Security tab...
Read more >How to Fix βDownload Failed: Network Errorβ on Chrome
The easiest way to get around that issue is to simply use incognito mode, also known as private browsing.
Read more >Google Chrome Not Downloading Files: What to Do? - Techbout
The problem could be due to the path to default Chrome download location (Downloads Folder) becoming corrupted. Hence, change the download location to...
Read more >Fix file download errors - Google Chrome Help
If you get an error message on Chrome when you try to download apps, themes, extensions, or other files, try these fixes.
Read more >How to Fix 'Failed - Network Error' When Downloading on ...
Sometimes there is another program or service blocking access to the default Downloads folder and you should change it to something else. Also,Β ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Seems like this last site sends some ASCII art with its headers:
which makes Twisted choke on this line. There is no
b":"
in the received header, hence theValueError
:AFAICT, these are not RFC-compliant headers: "Each header field consists of a name followed by a colon (βπ and the field valueβ (RFC 2616, section 4.2).
Iβve written up a workaround here.