[core] Async Actor Task fails when `max_retries=-1`
See original GitHub issueWhat is the problem?
An actor task fails when the actor dies, despite having max_retries=-1 & max_restart=-1
Ray version and other system information (Python version, TensorFlow version, OS):
Reproduction (REQUIRED)
The easiest way to test this is with Serve
- Add the following method to
python/ray/serve/controller.py::ServeController
def _test_crash(self):
os._exit(0)
- Add the following between L44 & L45 (the assert) in
python/ray/serve/tests/test_standalone.py:
with pytest.raises(ray.exceptions.RayActorError):
ray.get(client._controller._test_crash.remote())
- Run
python -m pytest -sv python/ray/serve/tests/test_standalone.py::test_detached_deployment
Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):
If we cannot run your script, we cannot fix your issue.
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (14 by maintainers)
Top Results From Across the Web
Task on MainActor does not run on the main thread, why?
I would expect that static func main() async throws inherits MainActor async context and will prevent data races, so the final counter value...
Read more >Problem with handling async inside of an actix-rust actor
Currently I'm checking out Actix, a Rust based actor framework. ... I'm not being able to start a simple async task inside of...
Read more >Using @MainActor to ensure execution on the main thread
Hi, I'm right now trying to build up a mental model of the new concurrency mechanisms, namely async await, actors and the MainActor....
Read more >AsyncIO / Concurrency for Actors — Ray 2.2.0
Setting concurrency in Async Actors#. You can set the number of “concurrent” task running at once using the max_concurrency flag. By default, 1000...
Read more >The Actor Reentrancy Problem in Swift - Swift Senpai
private func authorizeTransaction() async -> Bool { // Wait for 1 second try? await Task.sleep(nanoseconds: 1 * 1000000000) return true }.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I will work on this issue. (as it blocks Serve fault tolerance)
I see. What about just resending the tasks that were already completed? That way, we don’t need to modify the receiver logic at all and we can just save the out-of-order task specs on the sender side. I can actually see an argument for this approach since it follows the same execution semantics that are provided during normal execution, that the execution order follows submission order.
I’m fine with modifying the receiver logic if it’s necessary but I’d prefer not to since it’s nice to keep it free of any recovery logic.