question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: received 0 items of ancdata with custom gym

See original GitHub issue

I’m getting a RuntimeError when I try to run several custom gyms in parallel.

Traceback (most recent call last):
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/run.py", line 270, in <module>
    main()
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/run.py", line 244, in main
    model, env = train(args, extra_args)
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/run.py", line 88, in train
    **alg_kwargs
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/ppo2/ppo2.py", line 329, in learn
    obs, returns, masks, actions, values, neglogpacs, states, epinfos = runner.run() #pylint: disable=E0632
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/ppo2/ppo2.py", line 178, in run
    self.obs[:], rewards, self.dones, infos = self.env.step(actions)
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/common/vec_env/__init__.py", line 100, in step
    return self.step_wait()
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/common/vec_env/vec_normalize.py", line 23, in step_wait
    obs, rews, news, infos = self.venv.step_wait()
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/common/vec_env/subproc_vec_env.py", line 70, in step_wait
    results = [remote.recv() for remote in self.remotes]
  File "/home/atcold/Work/GitHub/OpenAI-RL-baselines/baselines/common/vec_env/subproc_vec_env.py", line 70, in <listcomp>
    results = [remote.recv() for remote in self.remotes]
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
    fd = df.detach()
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
    return recvfds(s, 1)[0]
  File "/misc/vlgscratch4/LecunGroup/atcold/anaconda3/envs/OpenAI/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

No clue about how to debug this. Do I need to add any special functionality to my gym to support parallel execution?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Atcoldcommented, Oct 10, 2018

Finally debugged this crap. It’s actually quite embarrassing typing here what was the cause… but I’ll do it for sake of rigour.

My gym step(action) function was returning:

  • observation: a NumPy multidimensional array;
  • reward: a scalar;
  • done: a boolean;
  • info: the whole agent (for debugging purposes).

The thing that I didn’t take in account is that with multiple processes the agent has to be serialised and sent through a pipe. My agent contains a crap ton of stuff, pandas tables, pygame cached font, and more shit.

Sending str(agent) instead (which is telling me who the current agent is) fixes the problem.

Now I have a new bug

self.ret = self.ret * self.gamma + rews
ValueError: operands could not be broadcast together with shapes (12,) (2,)

but this is a new adventure by its own, so I’m closing this issue. Thank you for your interest. I hope I’ve entertained you 😉

0reactions
Atcoldcommented, Oct 9, 2018

Alright. Made some progress. ulimit -n returns 1024. Setting ulimit -n 2048 made the script work for longer, but it died afterwards. So, I’m pretty sure I’m hitting “some” limit due to some missing deallocation of resources. I have now to figure out who’s doing this nasty thing.

Cc: @ikostrikov.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: received 0 items of ancdata - PyTorch Forums
How to solve it? 1 Like. Training stops due to Caught RuntimeError in DataLoader worker process 0 with large dataset of files.
Read more >
How to resolve the error: RuntimeError: received 0 items of ...
utils.data.DataLoader. I have created them with the following code. transform_train = transforms.Compose([ transforms.RandomCrop(32, padding ...
Read more >
RuntimeError: received 0 items of ancdata - Part 1 (2019)
I have a -trained- learner, which I'm trying to use to make predictions on a validation set, which consists of 100.000 samples, via...
Read more >
aiocoap - Read the Docs
In a separate terminal, use the aiocoap-client tool to send a GET request ... Message at 0x0123deadbeef: no mtype, GET (no MID, empty...
Read more >
Diff - platform/prebuilts/build-tools - Google Git
+ +ZERO-CLAUSE BSD LICENSE FOR CODE IN THE PYTHON DOCUMENTATION ... + if len(items) == 1: + traverser(items[0]) + self.write(",") + else: + ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found