Ray start crashes due to redis failing to start
See original GitHub issueSystem information
- OS Platform and Distribution: CentOS Linux release 7.4.1708 (Core)
- Ray installed from:
pip install -U ray[debug]
- Ray version: 0.7.6
- Python version: 3.6.9
- Exact command to reproduce:
ray start --head
;import ray; ray.init()
Describe the problem
I’ll preface by saying a) thanks in advance for any help and b) this issue surfaced on an HPC cluster so it’s possible there are some non-standard things about the cluster configuration. And I was able to get ray installed by building from source, so there is a workaround.
In short, pip-installed ray fails to launch the redis server and so crashes immediately. My hunch is that the subprocess call to redis-server
is failing but I haven’t been able to reproduce this at the command line, or get more verbose exception info from services.py
. Log files are unfortunately empty so I can only provide the output from runtime (see below).
Source code / logs
Installation:
conda create -n ray python=3.6 # 3.6 for compatibility with other things
pip install -U ray[debug] # also tried just "ray"
Reproducing error:
$ ray start --head --temp-dir=$LOCAL_SCRATCH
WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
2019-11-12 10:04:23,529 INFO scripts.py:303 -- Using IP address 10.148.0.29 for this node.
2019-11-12 10:04:23,542 INFO resource_spec.py:205 -- Starting Ray with 62.16 GiB memory available for workers and up to 18.63 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2019-11-12 10:04:23,648 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:23,751 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:23,853 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:23,956 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,058 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,161 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,263 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,366 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,468 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,571 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,673 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,776 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,879 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:24,981 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:25,084 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:25,186 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:25,289 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:25,391 WARNING services.py:822 -- Redis failed to start, retrying now.
2019-11-12 10:04:25,493 WARNING services.py:822 -- Redis failed to start, retrying now.
Traceback (most recent call last):
File "/home/dbiagion/.conda-envs/ray-test/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/scripts/scripts.py", line 808, in main
return cli()
File "/home/dbiagion/.local/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/dbiagion/.local/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/dbiagion/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/dbiagion/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/dbiagion/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/scripts/scripts.py", line 314, in start
node = ray.node.Node(ray_params, head=True, shutdown_at_exit=block)
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/node.py", line 149, in __init__
self.start_head_processes()
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/node.py", line 571, in start_head_processes
self.start_redis()
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/node.py", line 426, in start_redis
include_java=self._ray_params.include_java)
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/services.py", line 660, in start_redis
stderr_file=redis_stderr_file)
File "/home/dbiagion/.conda-envs/ray-test/lib/python3.6/site-packages/ray/services.py", line 846, in _start_redis_instance
stdout_file.name, stderr_file.name))
Exception: Couldn't start Redis. Check log files: /tmp/scratch/session_2019-11-12_10-04-23_529834_390815/logs/redis.out /tmp/scratch/session_2019-11-12_10-04-23_529834_390815/logs/redis.err
Empty log files:
$ cat /tmp/scratch/session_2019-11-12_10-04-23_529834_390815/logs/redis.err
$ cat /tmp/scratch/session_2019-11-12_10-04-23_529834_390815/logs/redis.out
Thank you!
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Cannot launch ray crash course notebook, ray.init fails ...
I am having trouble to getting start with the Ray crash course from https://github.com/anyscale/academy.
Read more >Ray/Redis failure on ray.init. Any ideas? - Stack Overflow
Ray drivers are expected to run on a node in the cluster (usually the head node) and requires many ports which you probably...
Read more >redis-server crashes with === ASSERTION FAILED === when ...
Issue 165 in redis: redis-server crashes with === ASSERTION FAILED === when ... 1. created script to open connections to redis, but not...
Read more >Redis Crashes - <antirez>
Redis crashes === Redis users are not likely to see Redis crashing ... crash reports that are actually due to memory errors, I'm...
Read more >Bug listing with status RESOLVED with resolution TEST ...
1.4.1) segmentation fault'ed while starting up" status:RESOLVED resolution:TEST-REQUEST ... Bug:46852 - "Irssi 0.8.9 wont compile due to link error.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We figured out what was happening here. In case anyone else is running on a cluster having similar issues, we traced the issue to a library called
libxalt_init.so
used for program monitoring. For whatever reason, this library causes redis binary to segfault when it’s on the LD path. Our fix was to unset the variable enabling this library:I can imagine the
libxalt
library may live on different paths for different clusters, but hopefully this will get someone pointed in the right direction if encountering a similar issue!Thanks for the reply.
ray stop
(seems ok)ray start --head
yields the same error above,Redis failed to start, retrying now
.