Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exception when using MultiDiscrete action spaces

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
Ray installed from (source or binary): binary (via pip)
Ray version: 0.7.6
Python version: 3.6.9
Exact command to reproduce: python rllib_cartpole.py for the following file

import gym.envs.classic_control

import ray
from ray import tune


class CustomCartpole(gym.envs.classic_control.CartPoleEnv):
    """Add a dimension to the cartpole action space that is ignored."""

    def __init__(self, env_config):
        super().__init__()
        # if override_actions is false this is just the Cartpole environment
        self.override_actions = env_config['override_actions']
        if self.override_actions:
            # 2 is the environment's normal action space
            # 4 is just a dummy number to give it an extra dimension
            self.action_space = gym.spaces.MultiDiscrete([2, 4])

    def step(self, action):
        # call the cartpole environment with the original action
        if self.override_actions:
            return super().step(action[0])
        else:
            return super().step(action)


def main():

    ray.init()
    tune.run(
        "PPO",
        stop={"episode_reward_mean": 200},
        config={
            "env": CustomCartpole,
            "env_config": {'override_actions': False},
            "num_gpus": 1,
            "num_workers": 1,
            "eager": False,
        },
    )


if __name__ == '__main__':
    main()

Describe the problem

I am trying to train on an environment with a MultiDiscrete action space. The above file recreates the issue with a simple adjustment to CartPole. When the file is run with {'override_actions': False}, it trains with no problems. However, when using {'override_actions': True}, rllib throws an error pasted below. It looks like this issue was previously discussed here: #4866 and #4869

Source code / logs

Traceback (most recent call last):
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(InvalidArgumentError): [36mray_worker[39m (pid=30940, host=esquires-pc3)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2] vs. [128]
	 [[{{node default_policy_1/tower_1/gradients_1/default_policy_1/tower_1/add_4_grad/BroadcastGradientArgs}}]]

During handling of the above exception, another exception occurred:

[36mray_worker[39m (pid=30940, host=esquires-pc3)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 421, in train
    raise e
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 407, in train
    result = Trainable.train(self)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/tune/trainable.py", line 176, in train
    result = self._train()
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _train
    fetches = self.optimizer.step()
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 204, in step
    self.per_device_batch_size)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_impl.py", line 260, in optimize
    return sess.run(fetches, feed_dict=feed_dict)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/esquires/.adt/venv-adt/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2] vs. [128]
	 [[node default_policy_1/tower_1/gradients_1/default_policy_1/tower_1/add_4_grad/BroadcastGradientArgs (defined at /ray/rllib/agents/ppo/ppo_policy.py:211) ]]

Original stack trace for 'default_policy_1/tower_1/gradients_1/default_policy_1/tower_1/add_4_grad/BroadcastGradientArgs':
  File "/ray/workers/default_worker.py", line 98, in <module>
    ray.worker.global_worker.main_loop()
  File "/ray/rllib/agents/trainer_template.py", line 90, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/ray/rllib/agents/trainer.py", line 372, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/ray/tune/trainable.py", line 96, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/ray/rllib/agents/trainer.py", line 492, in _setup
    self._init(self.config, self.env_creator)
  File "/ray/rllib/agents/trainer_template.py", line 111, in _init
    self.optimizer = make_policy_optimizer(self.workers, config)
  File "/ray/rllib/agents/ppo/ppo.py", line 89, in choose_policy_optimizer
    shuffle_sequences=config["shuffle_sequences"])
  File "/ray/rllib/optimizers/multi_gpu_optimizer.py", line 123, in __init__
    self.per_device_batch_size, policy.copy))
  File "/ray/rllib/optimizers/multi_gpu_impl.py", line 95, in __init__
    len(input_placeholders)))
  File "/ray/rllib/optimizers/multi_gpu_impl.py", line 297, in _setup_device
    graph_obj._loss)
  File "/ray/rllib/policy/tf_policy_template.py", line 168, in gradients
    return gradients_fn(self, optimizer, loss)
  File "/ray/rllib/agents/ppo/ppo_policy.py", line 211, in clip_gradients
    return optimizer.compute_gradients(loss, variables)
  File "/tensorflow/python/training/optimizer.py", line 512, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/tensorflow/python/ops/gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "/tensorflow/python/ops/gradients_util.py", line 731, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/tensorflow/python/ops/gradients_util.py", line 403, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/tensorflow/python/ops/gradients_util.py", line 731, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/tensorflow/python/ops/math_grad.py", line 1004, in _AddGrad
    rx, ry = gen_array_ops.broadcast_gradient_args(sx, sy)
  File "/tensorflow/python/ops/gen_array_ops.py", line 829, in broadcast_gradient_args
    "BroadcastGradientArgs", s0=s0, s1=s1, name=name)
  File "/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'default_policy_1/tower_1/add_4', defined at:
  File "/ray/workers/default_worker.py", line 98, in <module>
    ray.worker.global_worker.main_loop()
[elided 11 identical lines from previous traceback]
  File "/ray/rllib/optimizers/multi_gpu_impl.py", line 95, in __init__
    len(input_placeholders)))
  File "/ray/rllib/optimizers/multi_gpu_impl.py", line 295, in _setup_device
    graph_obj = self.build_graph(device_input_slices)
  File "/ray/rllib/policy/dynamic_tf_policy.py", line 237, in copy
    loss = instance._do_loss_init(input_dict)
  File "/ray/rllib/policy/dynamic_tf_policy.py", line 353, in _do_loss_init
    loss = self._loss_fn(self, self.model, self.dist_class, train_batch)
  File "/ray/rllib/agents/ppo/ppo_policy.py", line 146, in ppo_surrogate_loss
    model_config=policy.config["model"])
  File "/ray/rllib/agents/ppo/ppo_policy.py", line 106, in __init__
    vf_loss_coeff * vf_loss - entropy_coeff * curr_entropy)
  File "/tensorflow/python/ops/math_ops.py", line 884, in binary_op_wrapper
    return func(x, y, name=name)
  File "/tensorflow/python/ops/gen_math_ops.py", line 387, in add
    "Add", x=x, y=y, name=name)
  File "/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Issue Analytics

State:
Created 4 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

iamhateszcommented, Apr 17, 2020

Seems like MultiDiscrete works for PyTorch with latest master.

1reaction

ericlcommented, Dec 6, 2019

I believe our MultIDiscrete support is a bit half baked. Maybe try Tuple([Discrete(2), Discrete(4)]) instead? This should be an equivalent tuple space.

Top Results From Across the Web

Action Space Shaping in Deep Reinforcement Learning - arXiv

spaces over all games, with MsPacman being an exception. Here the multi-discrete agent achieved almost one-quarter higher score than other action spaces.

Is any multi discrete action example for PPO or other algorithms?

The more general answer is if you have an environment that defines a multidiscrete space there is not really anything special you have...

'MultiDiscrete' object has no attribute 'spaces' - Data Science ...

I'm using an MultiDiscrete Anction and Observation Space. The Action Space takes 4 slots with 6 colors each and the Observation Space is...

Action Space Shaping in Deep Reinforcement Learning

Multi-discrete action space performs as well as discretized versions. Using discrete actions with only one button down is the least reliable out of...

Training DQN Agent with Multidiscrete action space in gym

I had the same problem, unfortunately it's impossible to use gym.spaces.MultiDiscrete with the DQNAgent in Keras-rl . Solution:.