Redis LRU eviction causes hangs with ray.wait() (but get() is fine)
See original GitHub issueSystem information
- Ray version: 0.6.4
Describe the problem
import ray
@ray.remote
def f():
return 0
@ray.remote
def g():
import time
start = time.time()
while time.time() < start + 1:
ray.get([f.remote() for _ in range(10)])
import ray
# 10MB -> hangs after ~5 iterations
# 20MB -> hangs after ~20 iterations
# 50MB -> hangs after ~50 iterations
ray.init(redis_max_memory=1024 * 1024 * 50)
i = 0
while True:
i += 1
a = g.remote()
[ok], _ = ray.wait([a])
print("iter", i)
Source code / logs
The above example will reproducibly hang with number of iterations proportional to the redis memory size.
The expected behaviour is that once the memory size is large enough, it can run forever.
Also, we shouldn’t be hanging and should throw an error if redis capacity is too small.
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (12 by maintainers)
Top Results From Across the Web
Key eviction | Redis
Redis checks the memory usage, and if it is greater than the maxmemory limit , it evicts keys according to the policy. A...
Read more >API and Package Reference — Ray 0.8.7 documentation
Once the limit is exceeded, redis will start LRU eviction of entries. ... Disconnect the worker, and terminate processes started by ray.init().
Read more >[Ray Tune] Ray crashes and system hangs - Google Groups
So I tried flushing redis, invoking the python gc, and closing the tf session in _stop. The resident set size doesn't seem huge...
Read more >Ray Documentation - Read the Docs
Below, we have instructions for installing dependencies and building from source for both Linux and MacOS. Dependencies. To build Ray, first ...
Read more >Spring Cloud configuration properties
http.client.HttpClient and this setting affects the actual connection creation and also the wait time to get the connection from the pool. eureka.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay, just remembered that object table notifications only get sent to clients who requested notifications for a specific object. And the set of clients to notify is itself stored as a Redis entry. So most likely that key is getting evicted, so the client never receives the notification.
Two options:
Never mind, moving the table to the primary shard won’t work since that means moving the table that you’re subscribing to (the object table in this case).
Hmm, @robertnishihara, we could probably have the redis server touch the relevant key (if it exists) whenever a notification is requested.
Edit: Actually, in many cases i think it will probably just work out without touching the object table key. Usually when a client requests notifications from the object table, it’s because the key is currently empty and the client wants to know when the key has a value (i.e. the object has a location). Not sure how this would work out if a client needs notifications for a particular object for a long time, though.