question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Core] `max_calls=1` crashes ray when many tasks are launched.

See original GitHub issue

What is the problem?

  • Ray crashes with overflow_cpu_instances[i] == 0 Should not be overflow when launching many tasks with max_calls=1. Full error at bottom

Ray version and other system information (Python version, TensorFlow version, OS):

  • Ray: 1.2.0 (and Ray 2.0.0)
  • Python 3.7.7
  • Docker: rayproject/ray-ml:1.2.0-gpu (but no GPUs used)

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

On a machine with 2 CPUs (m5.large)

import ray
import time
ray.init()

@ray.remote(max_calls=1)
def one_sec():
    time.sleep(1)

for _ in range(60):
    one_sec.remote()
    one_sec.remote()
    time.sleep(0.5)

If the code snippet cannot be run by itself, the issue will be closed with “needs-repro-script”.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
(raylet) [2021-03-18 09:18:04,554 C 1068 1068] cluster_task_manager.cc:809:  Check failed: overflow_cpu_instances[i] == 0 Should not be overflow
(raylet) [2021-03-18 09:18:04,554 E 1068 1068] logging.cc:415: *** Aborted at 1616084284 (unix time) try "date -d @1616084284" if you are using GNU date ***
(raylet) [2021-03-18 09:18:04,555 E 1068 1068] logging.cc:415: PC: @                0x0 (unknown)
(raylet) [2021-03-18 09:18:04,555 E 1068 1068] logging.cc:415: *** SIGABRT (@0x42c) received by PID 1068 (TID 0x7fbd2d556800) from PID 1068; stack trace: ***
(raylet) [2021-03-18 09:18:04,555 E 1068 1068] logging.cc:415:     @     0x564ba40989ef google::(anonymous namespace)::FailureSignalHandler()
(raylet) [2021-03-18 09:18:04,556 E 1068 1068] logging.cc:415:     @     0x7fbd2d12c980 (unknown)
(raylet) [2021-03-18 09:18:04,556 E 1068 1068] logging.cc:415:     @     0x7fbd2c220fb7 gsignal
(raylet) [2021-03-18 09:18:04,556 E 1068 1068] logging.cc:415:     @     0x7fbd2c222921 abort
(raylet) [2021-03-18 09:18:04,556 E 1068 1068] logging.cc:415:     @     0x564ba3c36e9c _ZN3ray6RayLogD2Ev.cold
(raylet) [2021-03-18 09:18:04,557 E 1068 1068] logging.cc:415:     @     0x564ba3d3b11f ray::raylet::ClusterTaskManager::ReleaseCpuResourcesFromUnblockedWorker()
(raylet) [2021-03-18 09:18:04,558 E 1068 1068] logging.cc:415:     @     0x564ba3ce9737 ray::raylet::NodeManager::HandleDirectCallTaskBlocked()
(raylet) [2021-03-18 09:18:04,559 E 1068 1068] logging.cc:415:     @     0x564ba3ce97e9 ray::raylet::NodeManager::ProcessDirectCallTaskBlocked()
(raylet) [2021-03-18 09:18:04,560 E 1068 1068] logging.cc:415:     @     0x564ba3d277e2 ray::raylet::NodeManager::ProcessClientMessage()
(raylet) [2021-03-18 09:18:04,560 E 1068 1068] logging.cc:415:     @     0x564ba3c866a1 _ZNSt17_Function_handlerIFvSt10shared_ptrIN3ray16ClientConnectionEElRKSt6vectorIhSaIhEEEZNS1_6raylet6Raylet12HandleAcceptERKN5boost6system10error_codeEEUlS3_lS8_E0_E9_M_invokeERKSt9_Any_dataOS3_OlS8_
(raylet) [2021-03-18 09:18:04,562 E 1068 1068] logging.cc:415:     @     0x564ba404430e ray::ClientConnection::ProcessMessage()
(raylet) [2021-03-18 09:18:04,563 E 1068 1068] logging.cc:415:     @     0x564ba40413bc boost::asio::detail::reactive_socket_recv_op<>::do_complete()
(raylet) [2021-03-18 09:18:04,564 E 1068 1068] logging.cc:415:     @     0x564ba4407301 boost::asio::detail::scheduler::do_run_one()
(raylet) [2021-03-18 09:18:04,566 E 1068 1068] logging.cc:415:     @     0x564ba44089a9 boost::asio::detail::scheduler::run()
(raylet) [2021-03-18 09:18:04,567 E 1068 1068] logging.cc:415:     @     0x564ba440ae97 boost::asio::io_context::run()
(raylet) [2021-03-18 09:18:04,568 E 1068 1068] logging.cc:415:     @     0x564ba3c52ce2 main
(raylet) [2021-03-18 09:18:04,568 E 1068 1068] logging.cc:415:     @     0x7fbd2c203bf7 __libc_start_main
(raylet) [2021-03-18 09:18:04,570 E 1068 1068] logging.cc:415:     @     0x564ba3c67da5 (unknown)
zsh: abort (core dumped)  ipython

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
wuisawesomecommented, Apr 21, 2021

#15083 fixes it (manually confirmed). Should be able to merge it pretty soon.

0reactions
sanjaysrikakulamcommented, Apr 29, 2021

OK, thank you! Looking forward to 1.4.

Read more comments on GitHub >

github_iconTop Results From Across the Web

FCallStackInfo
Framework for creating high-fidelity digital humans in minutes. ... Fast, easy, real-time immersive 3D visualization. ... Your gateway to Megascans and a world...
Read more >
https://downloads.asterisk.org/pub/telephony/certi...
2017-04-04 12:37 +0000 Asterisk Development Team <asteriskteam@digium.com> * asterisk certified/13.13-cert3 Released. 2017-03-27 09:03 +0000 [d91f264721] ...
Read more >
Cisco-TelePresence-Video-Communication-Server-and- ...
Provides a list of the licenses and notices for open source software used in this product.
Read more >
SCIP Doxygen Documentation: CHANGELOG Source File
118 feasibility of the ray is now checked. This fix now might lead to several rounds of separation in order to resolve unbounded...
Read more >
2000-7.0-RFM01-1299 Glossary
The number of times the caller hangs up unexpectedly during billing ... Many computer keyboards have arrow keys that move the cursor up,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found