question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapy chokes on HTTP response status lines without a Reason phrase

See original GitHub issue

Try fetch page:

$ scrapy fetch 'http://www.gidroprofmontag.ru/bassein/sbornue_basseynu'

output:

2013-07-11 09:15:37+0400 [scrapy] INFO: Scrapy 0.17.0-304-g3fe2a32 started (bot: amon)
/home/tonal/amon/amon/amon/downloadermiddleware/blocked.py:6: ScrapyDeprecationWarning: Module `scrapy.stats` is deprecated, use `crawler.stats` attribute instead
  from scrapy.stats import stats
2013-07-11 09:15:37+0400 [amon_ra] INFO: Spider opened
2013-07-11 09:15:37+0400 [amon_ra] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-07-11 09:15:37+0400 [amon_ra] ERROR: Error downloading <GET http://www.gidroprofmontag.ru/bassein/sbornue_basseynu>: [<twisted.python.failure.Failure <class 'scrapy.xlib.tx._newclient.ParseError'>>]
2013-07-11 09:15:37+0400 [amon_ra] INFO: Closing spider (finished)
2013-07-11 09:15:37+0400 [amon_ra] INFO: Dumping Scrapy stats:
        {'downloader/exception_count': 1,
         'downloader/exception_type_count/scrapy.xlib.tx._newclient.ResponseFailed': 1,
         'downloader/request_bytes': 256,
         'downloader/request_count': 1,
         'downloader/request_method_count/GET': 1,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2013, 7, 11, 5, 15, 37, 512010),
         'log_count/ERROR': 1,
         'log_count/INFO': 4,
         'scheduler/dequeued': 1,
         'scheduler/dequeued/memory': 1,
         'scheduler/enqueued': 1,
         'scheduler/enqueued/memory': 1,
         'start_time': datetime.datetime(2013, 7, 11, 5, 15, 37, 257898)}
2013-07-11 09:15:37+0400 [amon_ra] INFO: Spider closed (finished)

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:37 (24 by maintainers)

github_iconTop GitHub Comments

2reactions
mohanbecommented, Dec 4, 2018

Basically Scrapy Ignores 404 Error by Default, It was defined in httperror middleware.

So, Add HTTPERROR_ALLOW_ALL = True to your settings file.

After this you can access response.status through your parse function.

2reactions
rmaxcommented, Feb 28, 2017
Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy: HTTP status code is not handled or not allowed?
I wanted to crawl one website, which worked totally fine from my home PC, but did not respond at all (not even 404)...
Read more >
Sending Request Headers With Scrapy Spider To Avoid 403 ...
Contribution should be done in a form of open Pull Request to solve a Scrapy chokes on HTTP response status lines without a...
Read more >
Source code for scrapy.http.response.text
This module implements the TextResponse class which adds encoding handling and discovering (through HTTP headers) to base Response class.
Read more >
Untitled
Nathan dunlap chuck e cheese killer, Imagen de epanadiplosis, Calleigh duquesne wikia, Taisuke vs issei final, Michigan state code drivers license, ...
Read more >
Erle Robotics Python Gitbook Free
get started with drones without risking a hand? ... You can check return code and error codes and generally drive yourself crazy. Your....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found