Docker Quickstart Error for Crawler container
See original GitHub issueHi guys, Thanks for this amazing project!
I’ve some trouble setting up a dockerized cluster.
I am following this quickstart step by step but - only into scrapycluster_crawler_1
- the online test does not pass properly.
My local setup:
MacBook Pro - High Sierra
Docker4Mac Version 17.12.0-ce-mac47 (21805)
Below full console output:
root@e5bea50cbe71:/usr/src/app# ./run_docker_tests.sh
/usr/src/app/crawling/distributed_scheduler.py:8: ScrapyDeprecationWarning: Module `scrapy.conf` is deprecated, use `crawler.settings` attribute instead
from scrapy.conf import settings
test_change_config (test_distributed_scheduler.TestDistributedSchedulerChangeConfig) ... ok
test_create_queues (test_distributed_scheduler.TestDistributedSchedulerCreateQueues) ... ok
test_enqueue_request (test_distributed_scheduler.TestDistributedSchedulerEnqueueRequest) ... /usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:306: SystemTimeWarning: System time is way off (before 2016-01-01). This will probably lead to SSL verification errors
SystemTimeWarning
ok
test_error_config (test_distributed_scheduler.TestDistributedSchedulerErrorConfig) ... ok
test_expire_queues (test_distributed_scheduler.TestDistributedSchedulerExpireQueues) ... ok
test_find_item (test_distributed_scheduler.TestDistributedSchedulerFindItem) ... ok
test_fit_scale (test_distributed_scheduler.TestDistributedSchedulerFitScale) ... ok
test_load_domain_config (test_distributed_scheduler.TestDistributedSchedulerLoadDomainConfig) ... ok
test_next_request (test_distributed_scheduler.TestDistributedSchedulerNextRequest) ... ok
test_parse_cookie (test_distributed_scheduler.TestDistributedSchedulerParseCookie) ... ok
test_update_domain_queues (test_distributed_scheduler.TestDistributedSchedulerUpdateDomainQueues) ... ok
test_link_spider_parse (test_link_spider.TestLinkSpider) ... ok
/usr/src/app/crawling/log_retry_middleware.py:10: ScrapyDeprecationWarning: Importing from scrapy.xlib.tx is deprecated and will no longer be supported in future Scrapy versions. Update your code to import from twisted proper.
from scrapy.xlib.tx import ResponseFailed
test_lrm_stats_setup (test_log_retry_middleware.TestLogRetryMiddlewareStats) ... ok
test_mpm_middleware (test_meta_passthrough_middleware.TestMetaPassthroughMiddleware) ... ok
test_process_item (test_pipelines.TestKafkaPipeline) ... ok
test_process_item (test_pipelines.TestLoggingBeforePipeline) ... ok
test_dupe_filter (test_redis_dupefilter.TestRedisDupefilter) ... ok
test_retries (test_redis_retry_middleware.TestRedisRetryMiddleware) ... ok
test_load_stats_codes (test_redis_stats_middleware.TestRedisStatsMiddleware) ... ok
test_rsm_input (test_redis_stats_middleware.TestRedisStatsMiddleware) ... ok
test_link_spider_parse (test_wandering_spider.TestWanderingSpider) ... ok
----------------------------------------------------------------------
Ran 21 tests in 4.665s
OK
/usr/src/app/crawling/spiders/link_spider.py:6: ScrapyDeprecationWarning: Module `scrapy.conf` is deprecated, use `crawler.settings` attribute instead
from scrapy.conf import settings
test_crawler_process (__main__.TestLinkSpider) ... /usr/src/app/crawling/log_retry_middleware.py:10: ScrapyDeprecationWarning: Importing from scrapy.xlib.tx is deprecated and will no longer be supported in future Scrapy versions. Update your code to import from twisted proper.
from scrapy.xlib.tx import ResponseFailed
2018-01-16 22:30:13,058 [sc-crawler] INFO: Changed Public IP: None -> b'87.4.65.220'
ERROR
======================================================================
ERROR: test_crawler_process (__main__.TestLinkSpider)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/online.py", line 92, in test_crawler_process
m = next(self.consumer)
File "/usr/local/lib/python2.7/site-packages/future/builtins/newnext.py", line 65, in newnext
raise e
StopIteration
----------------------------------------------------------------------
Ran 1 test in 36.955s
FAILED (errors=1)
integration tests failed
Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:6
Top Results From Across the Web
How to Fix and Debug Docker Containers Like a Superhero
Container errors are tricky to diagnose, but some investigative magic works wonders. Read along to learn how to debug Docker containers.
Read more >Quick Start — Scrapy Cluster 1.3 documentation
The Docker Quickstart will help you spin up a complete standalone cluster ... At the time of writing, there is no Docker container...
Read more >How to get docker toolbox to work with .net core 2.0 project
I have tried running this executable, and it seems to be working. My containers are running, but the error for Visual Studio Container...
Read more >Run Enterprise Search server using Docker images - Elastic
Run Enterprise Search using docker run edit. Use docker run to manage Elastic containers imperatively. Enterprise Search depends on Elasticsearch and Kibana.
Read more >How to install Docker on Windows behind a proxy
Head over to the Docker Toolbox page to grab the install. ... Error creating machine: Error in driver during machine creation: This computer ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Actually I may have found the issue, I will be releasing a 1.2.1 hotfix hopefully today. Use a different website to execute the crawl, like
http://dmoztools.net
. Your crawler should be working fine, but the new IST Research website appears to cause issues with javascript inside of the scraper.All of the integration tests passed here when I changed the urls at this commit.
Thanks @madisonb !!