question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

data_phase2 ray actor dies

See original GitHub issue

Hey Dian,

Trying to run data_phase2 and I get the following Ray error (seems to have issue with RemoteMainDataset constructor?). I did some debugging by replacing all the @ray.remote stuff and .remote() commands with the non-ray versions, and the code runs with no issue (although the progress bar didn’t progress past 0 frames after a minute or two, not quite sure if it’s supposed to take that long or not).

Did you ever see anything like this/know what I should do?

(wor) aaron@Aarons-Machine:~/workspace/carla/WorldOnRails$ RAY_PDB=1 python -m rails.data_phase2 --num-workers=12
Traceback (most recent call last):
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/runpy.py", line 193, in _run_module_as_main
2021-05-29 14:45:49,862 WARNING worker.py:1034 -- Traceback (most recent call last):
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/function_manager.py", line 251, in get_execution_info
    info = self._function_execution_info[job_id][function_id]
KeyError: FunctionID(41f68a98bcf1c9ebc84e01b0819040089631493c)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 550, in ray._raylet.task_execution_handler
  File "python/ray/_raylet.pyx", line 364, in ray._raylet.execute_task
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/function_manager.py", line 256, in get_execution_info
    raise KeyError(message)
KeyError: 'Error occurs in get_execution_info: job_id: JobID(01000000), function_descriptor: {type=PythonFunctionDescriptor, module_name=rails.datasets.main_dataset, class_name=RemoteMainDataset, function_name=__init__, function_hash=084f10af-7af1-46d7-8dda-ada171c2aad9}. Message: FunctionID(41f68a98bcf1c9ebc84e01b0819040089631493c)'
An unexpected internal error occurred while the worker was executing a task.
    "__main__", mod_spec)
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/runpy.py", line 85, in _run_code
2021-05-29 14:45:49,862 WARNING worker.py:1034 -- A worker died or was killed while executing task ffffffffffffffffcb230a5701000000.
    exec(code, run_globals)
  File "/home/aaron/workspace/carla/WorldOnRails/rails/data_phase2.py", line 67, in <module>
    main(args)
  File "/home/aaron/workspace/carla/WorldOnRails/rails/data_phase2.py", line 13, in main
    total_frames = ray.get(dataset.num_frames.remote())
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/worker.py", line 1381, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
(wor) aaron@Aarons-Machine:~/workspace/carla/WorldOnRails$

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
aaronh65commented, Jun 2, 2021

That worked! Thanks a ton 😃

1reaction
aaronh65commented, Jun 1, 2021

@dotchen messed around with it today and got some strange behavior. I’ve described them briefly below. I’m running RAY_PDB=1 python -m rails.data_phase2 --num-workers=1 (btw the RAILS.md tells users to use a --num-runners argument rather than the correct --num-workers argument for this phase)

1. with ray local_mode=False The actor dies as described in the original post.

2. with ray local_mode=True Produces the following error

(wor) aaron@Aarons-Machine:/data/aaronhua/wor/data/main$ ray debug
2021-06-01 15:56:56,813 INFO scripts.py:193 -- Connecting to Ray instance at 192.168.1.138:6379.
2021-06-01 15:56:56,814 INFO worker.py:657 -- Connecting to existing Ray cluster at address: 192.168.1.138:6379
Active breakpoints:
0: python -m rails.data_phase2 --num-workers=1 | /home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/actor.py:677
Traceback (most recent call last):

  File "python/ray/_raylet.pyx", line 456, in ray._raylet.execute_task

  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task

  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task

  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/function_manager.py", line 556, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)

  File "/home/aaron/workspace/carla/WorldOnRails/rails/rails.py", line 242, in __init__
    self._rails = RAILS(args)

  File "/home/aaron/workspace/carla/WorldOnRails/rails/rails.py", line 27, in __init__
    self.ego_model  = EgoModel(dt=1./args.fps*(args.num_repeat+1)).to(args.device)

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/torch/nn/modules/module.py", line 673, in to
    return self._apply(convert)

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/torch/nn/modules/module.py", line 409, in _apply
    param_applied = fn(param)

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/torch/nn/modules/module.py", line 671, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
    torch._C._cuda_init()

RuntimeError: No CUDA GPUs are available

Enter breakpoint index or press enter to refresh: 

3. with ray local_mode=True The only thing that changed here is I printed out torch.cuda.is_available() right at the beginning of rails.data_phase2’s __main__ function (obviously to debug the above). For some reason, this makes it work and I successfully ran the script on a toy dataset of like 1000 frames in 2-3 minutes. See here - https://wandb.ai/aaronhuang/carla_data_phase2/runs/5flpwvwk?workspace=user-aaronhuang

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ray Liotta, 'Goodfellas' and 'Field of Dreams' star, dies at 67
Ray Liotta, the actor best known for portraying mobster Henry Hill "Goodfellas" died at 67. ... 'Goodfellas' star Ray Liotta dies at 67....
Read more >
Ray Liotta, star of 'Good Fellas' and 'Field of Dreams,' dies at 67
Actor Ray Liotta in starred in the mob drama “Goodfellas” and in the ... “Field of Dreams,” was found dead May 26 in...
Read more >
Ray Liotta Dead: 'Goodfellas' Star & 'Field Of Dreams' Actor ...
Ray Liotta has died. The 'Goodfellas' star and 'Field Of Dreams' actor was 67.
Read more >
Ray Liotta Dead: 'Goodfellas' Actor Dies at Age 67 - Us Weekly
Ray Liotta earned his breakout role in 'Goodfellas' and recently starred on the series 'Shades of Blue' — read more.
Read more >
'Goodfellas' star Ray Liotta dead at 67 - NPR
Actor Ray Liotta has died at 67. He starred in 'Goodfellas' According to his publicist, the actor died in his sleep in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found