question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Default downloader fails to get page

See original GitHub issue

β€˜http://autos.msn.com/research/userreviews/reviewlist.aspx?ModelID=14749’

Looks like the default downloader implemented with twisted lib can’t fetch the above url. I ran β€˜scrapy shell http://autos.msn.com/research/userreviews/reviewlist.aspx?ModelID=14749’, and got the following output.

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.17.0', 'scrapy')
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 489, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 1207, in run_script
    execfile(script_filename, namespace, namespace)
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/commands/shell.py", line 47, in run
    shell.start(url=url, spider=spider)
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/shell.py", line 43, in start
    self.fetch(url, spider)
  File "/Library/Python/2.7/site-packages/Scrapy-0.17.0-py2.7.egg/scrapy/shell.py", line 85, in fetch
    reactor, self._schedule, request, spider)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/threads.py", line 118, in blockingCallFromThread
    result.raiseException()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/python/failure.py", line 370, in raiseException
    raise self.type, self.value, self.tb
twisted.internet.error.ConnectionDone: Connection was closed cleanly.

But both urlopen of urllib2 and requests.get can download the page smoothly.

Issue Analytics

  • State:open
  • Created 10 years ago
  • Reactions:2
  • Comments:15 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
elacuestacommented, Oct 26, 2021

Seems like this last site sends some ASCII art with its headers:

$ curl -I https://spotless.tech
HTTP/1.1 200 sP0tL3sS sP0tlLesS (β•―Β°β–‘Β°)β•―οΈ΅ ┻━┻
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–€β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–‘β–‘β–„β–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–ˆβ–„β–„β–‘β–‘β–„β–‘β–‘β–‘β–ˆβ–‘β–„β–„β–„β–‘β–‘β–‘
β–‘β–„β–„β–„β–„β–„β–‘β–‘β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–€β–‘β–‘β–‘β–‘β–€β–ˆβ–‘β–‘β–€β–„β–‘β–‘β–‘β–‘β–‘β–ˆβ–€β–€β–‘β–ˆβ–ˆβ–‘β–‘
β–‘β–ˆβ–ˆβ–„β–€β–ˆβ–ˆβ–„β–ˆβ–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–€β–€β–€β–€β–€β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘
β–‘β–‘β–€β–ˆβ–ˆβ–„β–€β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–‘β–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–ˆβ–ˆβ–‘
β–‘β–‘β–‘β–‘β–€β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–€β–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘β–‘β–„β–ˆβ–‘β–‘β–‘β–‘β–„β–‘β–„β–ˆβ–‘β–‘β–ˆβ–ˆβ–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–ˆβ–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–„β–‘β–‘β–„β–‘β–‘β–‘β–ˆβ–ˆβ–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–ˆβ–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–„β–‘β–‘β–€β–€β–€β–€β–€β–€β–€β–€β–‘β–‘β–„β–€β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–€β–€β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–€β–€β–€β–€β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–€β–‘β–‘β–ˆβ–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘β–‘β–‘β–€β–ˆβ–ˆβ–ˆβ–‘β–‘β–€β–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
Server: Sp0tw3b
Date: Tue, 26 Oct 2021 12:07:07 GMT
Content-Type: text/html
Content-Length: 33015
Connection: keep-alive
Last-Modified: Tuesday, 26-Oct-2021 12:07:07 GMT
Cache-Control: no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0
Accept-Ranges: bytes

which makes Twisted choke on this line. There is no b":" in the received header, hence the ValueError:

>>> a, b = b"foobar".split(b":", 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 2, got 1)

AFAICT, these are not RFC-compliant headers: "Each header field consists of a name followed by a colon (β€œπŸ˜Š and the field value” (RFC 2616, section 4.2).

2reactions
0xbf00commented, Sep 18, 2018

I’ve written up a workaround here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

6 Ways to Fix the β€œDownload Failed Network Error” on Chrome
To get started, press Win + R to open the Run command dialog box. Β· Type inetcpl. Β· Navigate to the Security tab...
Read more >
How to Fix β€œDownload Failed: Network Error” on Chrome
The easiest way to get around that issue is to simply use incognito mode, also known as private browsing.
Read more >
Google Chrome Not Downloading Files: What to Do? - Techbout
The problem could be due to the path to default Chrome download location (Downloads Folder) becoming corrupted. Hence, change the download location to...
Read more >
Fix file download errors - Google Chrome Help
If you get an error message on Chrome when you try to download apps, themes, extensions, or other files, try these fixes.
Read more >
How to Fix 'Failed - Network Error' When Downloading on ...
Sometimes there is another program or service blocking access to the default Downloads folder and you should change it to something else. Also,Β ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found