Exception when using MultiDiscrete action spaces
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
- Ray installed from (source or binary): binary (via pip)
- Ray version: 0.7.6
- Python version: 3.6.9
- Exact command to reproduce:
python rllib_cartpole.py
for the following file
import gym.envs.classic_control
import ray
from ray import tune
class CustomCartpole(gym.envs.classic_control.CartPoleEnv):
"""Add a dimension to the cartpole action space that is ignored."""
def __init__(self, env_config):
super().__init__()
# if override_actions is false this is just the Cartpole environment
self.override_actions = env_config['override_actions']
if self.override_actions:
# 2 is the environment's normal action space
# 4 is just a dummy number to give it an extra dimension
self.action_space = gym.spaces.MultiDiscrete([2, 4])
def step(self, action):
# call the cartpole environment with the original action
if self.override_actions:
return super().step(action[0])
else:
return super().step(action)
def main():
ray.init()
tune.run(
"PPO",
stop={"episode_reward_mean": 200},
config={
"env": CustomCartpole,
"env_config": {'override_actions': False},
"num_gpus": 1,
"num_workers": 1,
"eager": False,
},
)
if __name__ == '__main__':
main()
Describe the problem
I am trying to train on an environment with a MultiDiscrete action space. The above file recreates the issue with a simple adjustment to CartPole. When the file is run with {'override_actions': False}
, it trains with no problems. However, when using {'override_actions': True}
, rllib throws an error pasted below. It looks like this issue was previously discussed here: #4866 and #4869
Source code / logs
Traceback (most recent call last):
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
result = ray.get(trial_future[0])
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/worker.py", line 2121, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(InvalidArgumentError): [36mray_worker[39m (pid=30940, host=esquires-pc3)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2] vs. [128]
[[{{node default_policy_1/tower_1/gradients_1/default_policy_1/tower_1/add_4_grad/BroadcastGradientArgs}}]]
During handling of the above exception, another exception occurred:
[36mray_worker[39m (pid=30940, host=esquires-pc3)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 421, in train
raise e
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 407, in train
result = Trainable.train(self)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/tune/trainable.py", line 176, in train
result = self._train()
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _train
fetches = self.optimizer.step()
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 204, in step
self.per_device_batch_size)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_impl.py", line 260, in optimize
return sess.run(fetches, feed_dict=feed_dict)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2] vs. [128]
[[node default_policy_1/tower_1/gradients_1/default_policy_1/tower_1/add_4_grad/BroadcastGradientArgs (defined at /ray/rllib/agents/ppo/ppo_policy.py:211) ]]
Original stack trace for 'default_policy_1/tower_1/gradients_1/default_policy_1/tower_1/add_4_grad/BroadcastGradientArgs':
File "/ray/workers/default_worker.py", line 98, in <module>
ray.worker.global_worker.main_loop()
File "/ray/rllib/agents/trainer_template.py", line 90, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/ray/rllib/agents/trainer.py", line 372, in __init__
Trainable.__init__(self, config, logger_creator)
File "/ray/tune/trainable.py", line 96, in __init__
self._setup(copy.deepcopy(self.config))
File "/ray/rllib/agents/trainer.py", line 492, in _setup
self._init(self.config, self.env_creator)
File "/ray/rllib/agents/trainer_template.py", line 111, in _init
self.optimizer = make_policy_optimizer(self.workers, config)
File "/ray/rllib/agents/ppo/ppo.py", line 89, in choose_policy_optimizer
shuffle_sequences=config["shuffle_sequences"])
File "/ray/rllib/optimizers/multi_gpu_optimizer.py", line 123, in __init__
self.per_device_batch_size, policy.copy))
File "/ray/rllib/optimizers/multi_gpu_impl.py", line 95, in __init__
len(input_placeholders)))
File "/ray/rllib/optimizers/multi_gpu_impl.py", line 297, in _setup_device
graph_obj._loss)
File "/ray/rllib/policy/tf_policy_template.py", line 168, in gradients
return gradients_fn(self, optimizer, loss)
File "/ray/rllib/agents/ppo/ppo_policy.py", line 211, in clip_gradients
return optimizer.compute_gradients(loss, variables)
File "/tensorflow/python/training/optimizer.py", line 512, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/tensorflow/python/ops/gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "/tensorflow/python/ops/gradients_util.py", line 731, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/tensorflow/python/ops/gradients_util.py", line 403, in _MaybeCompile
return grad_fn() # Exit early
File "/tensorflow/python/ops/gradients_util.py", line 731, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/tensorflow/python/ops/math_grad.py", line 1004, in _AddGrad
rx, ry = gen_array_ops.broadcast_gradient_args(sx, sy)
File "/tensorflow/python/ops/gen_array_ops.py", line 829, in broadcast_gradient_args
"BroadcastGradientArgs", s0=s0, s1=s1, name=name)
File "/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
...which was originally created as op 'default_policy_1/tower_1/add_4', defined at:
File "/ray/workers/default_worker.py", line 98, in <module>
ray.worker.global_worker.main_loop()
[elided 11 identical lines from previous traceback]
File "/ray/rllib/optimizers/multi_gpu_impl.py", line 95, in __init__
len(input_placeholders)))
File "/ray/rllib/optimizers/multi_gpu_impl.py", line 295, in _setup_device
graph_obj = self.build_graph(device_input_slices)
File "/ray/rllib/policy/dynamic_tf_policy.py", line 237, in copy
loss = instance._do_loss_init(input_dict)
File "/ray/rllib/policy/dynamic_tf_policy.py", line 353, in _do_loss_init
loss = self._loss_fn(self, self.model, self.dist_class, train_batch)
File "/ray/rllib/agents/ppo/ppo_policy.py", line 146, in ppo_surrogate_loss
model_config=policy.config["model"])
File "/ray/rllib/agents/ppo/ppo_policy.py", line 106, in __init__
vf_loss_coeff * vf_loss - entropy_coeff * curr_entropy)
File "/tensorflow/python/ops/math_ops.py", line 884, in binary_op_wrapper
return func(x, y, name=name)
File "/tensorflow/python/ops/gen_math_ops.py", line 387, in add
"Add", x=x, y=y, name=name)
File "/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Action Space Shaping in Deep Reinforcement Learning - arXiv
spaces over all games, with MsPacman being an exception. Here the multi-discrete agent achieved almost one-quarter higher score than other action spaces.
Read more >Is any multi discrete action example for PPO or other algorithms?
The more general answer is if you have an environment that defines a multidiscrete space there is not really anything special you have...
Read more >'MultiDiscrete' object has no attribute 'spaces' - Data Science ...
I'm using an MultiDiscrete Anction and Observation Space. The Action Space takes 4 slots with 6 colors each and the Observation Space is...
Read more >Action Space Shaping in Deep Reinforcement Learning
Multi-discrete action space performs as well as discretized versions. Using discrete actions with only one button down is the least reliable out of...
Read more >Training DQN Agent with Multidiscrete action space in gym
I had the same problem, unfortunately it's impossible to use gym.spaces.MultiDiscrete with the DQNAgent in Keras-rl . Solution:.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Seems like MultiDiscrete works for PyTorch with latest master.
I believe our MultIDiscrete support is a bit half baked. Maybe try
Tuple([Discrete(2), Discrete(4)])
instead? This should be an equivalent tuple space.