Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redis LRU eviction causes hangs with ray.wait() (but get() is fine)

See original GitHub issue

System information

Ray version: 0.6.4

Describe the problem

import ray

@ray.remote
def f():
    return 0

@ray.remote
def g():
    import time
    start = time.time()
    while time.time() < start + 1:
        ray.get([f.remote() for _ in range(10)])

import ray
# 10MB -> hangs after ~5 iterations
# 20MB -> hangs after ~20 iterations
# 50MB -> hangs after ~50 iterations
ray.init(redis_max_memory=1024 * 1024 * 50)

i = 0
while True:
    i += 1
    a = g.remote()
    [ok], _ = ray.wait([a])
    print("iter", i)

Source code / logs

The above example will reproducibly hang with number of iterations proportional to the redis memory size.

The expected behaviour is that once the memory size is large enough, it can run forever.

Also, we shouldn’t be hanging and should throw an error if redis capacity is too small.

Issue Analytics

State:
Created 5 years ago
Comments:13 (12 by maintainers)

Top GitHub Comments

1reaction

stephanie-wangcommented, Feb 9, 2019

Okay, just remembered that object table notifications only get sent to clients who requested notifications for a specific object. And the set of clients to notify is itself stored as a Redis entry. So most likely that key is getting evicted, so the client never receives the notification.

Two options:

Mark those types of entries as unevictable. It would be fine to never evict them since these entries are short-lived anyway.
Store the clients as a C data structure instead of directly in Redis.

0reactions

stephanie-wangcommented, Feb 11, 2019

Never mind, moving the table to the primary shard won’t work since that means moving the table that you’re subscribing to (the object table in this case).

Hmm, @robertnishihara, we could probably have the redis server touch the relevant key (if it exists) whenever a notification is requested.

Edit: Actually, in many cases i think it will probably just work out without touching the object table key. Usually when a client requests notifications from the object table, it’s because the key is currently empty and the client wants to know when the key has a value (i.e. the object has a location). Not sure how this would work out if a client needs notifications for a particular object for a long time, though.