question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redis LRU eviction causes hangs with ray.wait() (but get() is fine)

See original GitHub issue

System information

  • Ray version: 0.6.4

Describe the problem

import ray

@ray.remote
def f():
    return 0

@ray.remote
def g():
    import time
    start = time.time()
    while time.time() < start + 1:
        ray.get([f.remote() for _ in range(10)])

import ray
# 10MB -> hangs after ~5 iterations
# 20MB -> hangs after ~20 iterations
# 50MB -> hangs after ~50 iterations
ray.init(redis_max_memory=1024 * 1024 * 50)

i = 0
while True:
    i += 1
    a = g.remote()
    [ok], _ = ray.wait([a])
    print("iter", i)

Source code / logs

The above example will reproducibly hang with number of iterations proportional to the redis memory size.

The expected behaviour is that once the memory size is large enough, it can run forever.

Also, we shouldn’t be hanging and should throw an error if redis capacity is too small.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
stephanie-wangcommented, Feb 9, 2019

Okay, just remembered that object table notifications only get sent to clients who requested notifications for a specific object. And the set of clients to notify is itself stored as a Redis entry. So most likely that key is getting evicted, so the client never receives the notification.

Two options:

  1. Mark those types of entries as unevictable. It would be fine to never evict them since these entries are short-lived anyway.
  2. Store the clients as a C data structure instead of directly in Redis.
0reactions
stephanie-wangcommented, Feb 11, 2019

Never mind, moving the table to the primary shard won’t work since that means moving the table that you’re subscribing to (the object table in this case).

Hmm, @robertnishihara, we could probably have the redis server touch the relevant key (if it exists) whenever a notification is requested.

Edit: Actually, in many cases i think it will probably just work out without touching the object table key. Usually when a client requests notifications from the object table, it’s because the key is currently empty and the client wants to know when the key has a value (i.e. the object has a location). Not sure how this would work out if a client needs notifications for a particular object for a long time, though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Key eviction | Redis
Redis checks the memory usage, and if it is greater than the maxmemory limit , it evicts keys according to the policy. A...
Read more >
API and Package Reference — Ray 0.8.7 documentation
Once the limit is exceeded, redis will start LRU eviction of entries. ... Disconnect the worker, and terminate processes started by ray.init().
Read more >
[Ray Tune] Ray crashes and system hangs - Google Groups
So I tried flushing redis, invoking the python gc, and closing the tf session in _stop. The resident set size doesn't seem huge...
Read more >
Ray Documentation - Read the Docs
Below, we have instructions for installing dependencies and building from source for both Linux and MacOS. Dependencies. To build Ray, first ...
Read more >
Spring Cloud configuration properties
http.client.HttpClient and this setting affects the actual connection creation and also the wait time to get the connection from the pool. eureka.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found