Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

'--cpu' flag causes IndexError: list index out of range

See original GitHub issue

Run: python -m spinup.run ppo --hid [32,32] --env LunarLander-v2 --exp_name installtest --gamma 0.999 --cpu 12 --seed 42

After a random number of epochs, ‘IndexError: list index out of range’ occurs: File "/home/steve/spinningup/spinup/utils/logx.py", line 321, in log_tabular vals = np.concatenate(v) if isinstance(v[0], np.ndarray) and len(v[0].shape)>0 else v IndexError: list index out of range

Despite passing --seed, this is not deterministic, but always seems to happen within the first ~20 epochs. The problem appears to be that v is [], hence the attempt to access v[0] fails.

–cpu auto also has the problem Only --cpu 1 seems to be safe.

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

richardrlcommented, May 1, 2021

I was running into this issue with PPO. Basically, the “steps_per_epoch” parameter must be at least as big as the max episode length of the environment. For CartPole-v1, this is 500, so the “steps_per_epoch” must be > 500.

The default setting in test_ppo.py is 100, which is not enough.

0reactions

Alberto-Hachecommented, Apr 5, 2022

I just stumbled on this same problem and can confirm the reason is what @jachiam guessed above.

I think, however, that the workaround suggested by @richardrl (thanks!) is ONLY correct for non-parallel training. The actual number of steps in the main loop is local_steps_per_epoch (which is not steps_per_epoch, but ‘local_steps_per_epoch’, calculated as local_steps_per_epoch = int(steps_per_epoch / num_procs())). If you set steps_per_epoch at 4000 and `–num_cpu’ is say 4, each of the four loops will run for only 1000 steps.

So what has worked for me (so far) is to set steps_per_epoch to, at least, max_ep_len times num_procs(), e.g.: max_ep_len = 2000 --cpu = 4 steps_per_epoch >= 8000