question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Worker print output vanishes if not flushed (or if worker unexpectedly dies)

See original GitHub issue
import sys
import ray

@ray.remote
class A:
    def __init__(self):
        print("print 1")
        sys.exit(1)

@ray.remote
class B:
    def __init__(self):
        pass
    def f(self):
        print("print 2")
        sys.exit(1)

@ray.remote
def f():
    print("print 3")
    sys.exit(1)


ray.init()
A.remote()
try:
    ray.get(B.remote().f.remote())
except:
    pass
try:
    ray.get(f.remote())
except:
    pass

This won’t print anything at all, and used to print all of 1, 2, 3.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
javierabosch2commented, Feb 9, 2021

I believe I am having these same issues occurring in ray 1.1.0

After a worker errors out, and I attempt to run the job again, I cannot neither print to the console or log warnings using log.warning

0reactions
robertnishiharacommented, Feb 28, 2019

Ok, there are two issues here.

  1. For the workload in the first comment, the prints actually do happen if the script stays alive, but if the driver exits before the prints get streamed from the log monitor to the driver then they don’t get printed.

  2. For the Tune workload in https://github.com/ray-project/ray/issues/4082#issuecomment-464623602, the driver was blocked in ray.wait which didn’t release the GIL and so the logs weren’t getting printed. Fixed in #4190.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Core] Ray fails to log or print messages to console after ...
Worker print output vanishes if not flushed (or if worker unexpectedly dies) ... ``` import sys import ray @ray.remote class A: def __init__(self ......
Read more >
13 Common Printer Problems and How to Fix Them
If your printer isn't responding to basic commands or is constantly crashing, a driver update may fix the issue right away. You could...
Read more >
Why did my worker die? - Dask.distributed
Workers may exit in normal functioning because they have been asked to, e.g., they received a keyboard interrupt (^C), or the scheduler scaled...
Read more >
Frequently Asked Questions - Slurm Workload Manager
Any job steps running on the nodes which are relinquished by the job will be killed unless initiated with the --no-kill option.
Read more >
stress-ng - a tool to load and stress a computer system
--oomable Do not respawn a stressor if it gets killed by the ... --branch N start N workers that randomly jump to 256...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found