Error running Tuple action space
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): RHEL 7.6
- Ray installed from (source or binary): Source
- Ray version: 0.7.6
- Python version: 3.6.8
- Exact command to reproduce:
Describe the problem
I have been running the IMPALA algorithm and trying to use a Tuplle action space in my custom env, Tuple(Discrete(9), Box(1,))… When training, at some point all my trials end of failing due to the following error trace…
(pid=3731) Traceback (most recent call last):
(pid=3731) File "/home/svc-tai-dev/virt/algo_36/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _d
o_call
(pid=3731) return fn(*args)
(pid=3731) File "/home/svc-tai-dev/virt/algo_36/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _r
un_fn
(pid=3731) options, feed_dict, fetch_list, target_list, run_metadata)
(pid=3731) File "/home/svc-tai-dev/virt/algo_36/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _c
all_tf_sessionrun
(pid=3731) run_metadata)
(pid=3731) tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 9 which is outside the valid range
of [0, 9). Label values: 9
(pid=3731) [[{{node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
It is not specific to my env, as I am able to reproduce the error using a toy problem with the same action space, such as below…
class MultiActionEnv(gym.Env):
def __init__(self):
super().__init__()
self.action_space = spaces.Tuple((
spaces.Discrete(9),
spaces.Box(low=0.0, high=1.0,
shape=(1,), dtype=np.float32)))
self.observation_space= spaces.Box(low=-1.0,
high=1.0, shape=(1,), dtype=np.float32)
def reset(self):
obs = self.observation_space.sample()
self.timestep = 0
return obs
def step(self, action):
# print(action)
obs = self.observation_space.sample()
reward = np.random.randn()
done = False
self.timestep += 1
if self.timestep > 1000:
done = True
return obs, reward, done, {}
register_env("MultiActionEnv-v0", lambda _: MultiActionEnv())
tune.run("IMPALA", name=_name, stop={"time_total_s": 10000000},
config={"num_workers": 2,"env": "MultiActionEnv-v0"},
checkpoint_at_end=False)
Source code / logs
The full trace in the error file…
Traceback (most recent call last):
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_
trial
result = self.trial_executor.fetch_result(trial)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 351, in fet
ch_result
result = ray.get(trial_future[0])
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/worker.py", line 2121, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray_IMPALA:train() (pid=8331, host=ip-172-31-5-40)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 418, in train
raise e
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 407, in train
result = Trainable.train(self)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/tune/trainable.py", line 176, in train
result = self._train()
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 129,
in _train
fetches = self.optimizer.step()
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/optimizers/async_samples_optimizer.py",
line 136, in step
sample_timesteps, train_timesteps = self._step()
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/optimizers/async_samples_optimizer.py",
line 178, in _step
for train_batch in self.aggregator.iter_train_batches():
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/optimizers/aso_aggregator.py", line 117
, in iter_train_batches
blocking_wait=True, max_yield=max_yield)):
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/optimizers/aso_aggregator.py", line 170
, in _augment_with_replay sample_batch = ray_get_and_free(sample_batch)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/utils/memory.py", line 33, in ray_get_a
nd_free result = ray.get(object_ids)
ray.exceptions.RayTaskError(ValueError): ray_RolloutWorker:sample() (pid=8070, host=ip-172-31-8-59)
File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 9 which is outside the valid range of [0, 9). Label values: 9
[[{{node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogit
s}}]] During handling of the above exception, another exception occurred: ray_RolloutWorker:sample() (pid=8070, host=ip-172-31-8-59) File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/utils/tf_run_builder.py", line 48, in g
et
self.feed_dict, os.environ.get("TF_TIMELINE_DIR"))
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/utils/tf_run_builder.py", line 94, in r
un_timeline
fetches = sess.run(ops, feed_dict=feed_dict)
File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, i
n run
run_metadata_ptr)
File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173,
in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/ubuntu/algo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 9 which is outside the valid range of [0, 9). Label values: 9
[[node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits
(defined at /ray/rllib/models/tf/tf_action_dist.py:54) ]]
Errors may have originated from an input operation.
Input Source operations connected to node default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits:
default_policy/split_1 (defined at /ray/rllib/models/tf/tf_action_dist.py:214)
Original stack trace for 'default_policy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits':
File "/ray/workers/default_worker.py", line 98, in <module>
ray.worker.global_worker.main_loop()
File "/ray/rllib/evaluation/rollout_worker.py", line 348, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/ray/rllib/evaluation/rollout_worker.py", line 764, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/ray/rllib/policy/tf_policy_template.py", line 143, in __init__
obs_include_prev_action_reward=obs_include_prev_action_reward)
File "/ray/rllib/policy/dynamic_tf_policy.py", line 170, in __init__
action_logp = action_dist.sampled_action_logp()
File "/ray/rllib/models/tf/tf_action_dist.py", line 261, in sampled_action_logp
p = self.child_distributions[0].sampled_action_logp()
File "/ray/rllib/models/tf/tf_action_dist.py", line 41, in sampled_action_logp
return self.logp(self.sample_op)
File "/ray/rllib/models/tf/tf_action_dist.py", line 54, in logp
logits=self.inputs, labels=tf.cast(x, tf.int32))
File "/tensorflow/python/ops/nn_ops.py", line 3342, in sparse_softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "/tensorflow/python/ops/gen_nn_ops.py", line 11350, in sparse_softmax_cross_entropy_with_logits
labels=labels, name=name)
File "/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
ray_RolloutWorker:sample() (pid=8070, host=ip-172-31-8-59)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 469, in sample
batches = [self.input_reader.next()]
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 56, in next
batches = [self.get_data()]
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 99, in get_data
item = next(self.rollout_provider)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 327, in _env_runner
active_episodes)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 551, in _do_policy_eval
eval_results[k] = builder.get(v)
File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/rllib/utils/tf_run_builder.py", line 53, in get
self.fetches, self.feed_dict))
ValueError: Error fetching: [TupleActions(batches=[<tf.Tensor 'default_policy/Squeeze_2:0' shape=(?,) dtype=int64>, <tf.Tensor 'default_policy/add_1:0' shape=(?, 1) dtype=float32>]), {'action_prob': <tf.Tensor 'default_policy/Exp_1:0' shape=(?,) dtype=float32>, 'action_logp': <tf.Tensor 'default_policy/add_2:0' shape=(?,) dtype=float32>, 'behaviour_logits': <tf.Tensor 'default_policy/concat:0' shape=(?, 11) dtype=float32>}], feed_dict={<tf.Tensor 'default_policy/observation:0' shape=(?, 54) dtype=float32>: [array([ 1. , 1. , 1. , 1. , 1. ,
0. , 0. , 0. , 0. , 0.70131896,
0.11328208, 0.11328208, -0.76101432, 0.36510177, 0.36162691,
0.03620103, 0.17106617, 0.08926075, 0.38048075, 0.36241551,
0.36211438, 0.34613316, 0.39477825, -0.12288058, 0.27199868,
-1.46890378, -1.40642859, 0.6146765 , 0.64622823, 0.56964214,
1.36563875, 1.09488068, 1.52385215, 1.94669157, 2.40748066,
2.10075465, -1.17808927, -1. , -1. , -1. ,
-1. , -1. , -1. , 0. , 0. ,
0. , -0.05753944, -0.05854531, -0.05952603, 0.19277988,
0.32501844, 0.44820571, 0.00793453, -0.50608655])], <tf.Tensor 'default_policy/action:0' shape=(?, 2) dtype=float32>: [array([ 0., -5441968.])], <tf.Tensor 'default_policy/prev_reward:0' shape=(?,) dtype=float32>: [0.0], <tf.Tensor 'default_policy/PlaceholderWithDefault:0' shape=() dtype=bool
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top Results From Across the Web
reinforcement learning - Action Space Tuple
I'm creating an only short RL env, I want the action space to be a mixture between Discrete [Sell,Hold,Close] and box (choose in...
Read more >Rllib with Tuple action space - Ray
Hi everyone,. we are having troubles with rllib when using Tuple action spaces in our gym environment. It seems that the preprocessor is...
Read more >stablebaselines algorithms exploring badly two-dimension ...
If I try to discretize my problem: self.action_space = gym.spaces.MultiDiscrete([2,32,26]) , the agent correctly learns the best possible (x,y) ...
Read more >Using mobile code to provide fault tolerance in tuple space ...
Tuple space based coordination languages suffer from poor fault tolerance ... Each run-time system uses a different implementation approach to transactions.
Read more >Safer Tuple Spaces - Department of Computing Science
gram errors at compile- or run-time. This is achieved by changing the de nition of the tuple space: it is given structure, behavior,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
FWIW, I think such issues can happen if NaNs appear in the policy output. When that happens, you can get out of range errors.
Usually it’s due to the observation or reward somehow becoming NaN, though it could be the policy diverging as well.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you’d still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray’s public slack channel.
Thanks again for opening the issue!