[rllib] Offline Learning Bug with SAC
See original GitHub issueI am getting the following error:
Traceback (most recent call last):
File "F:/02_Projekte/00_Reinforcement-Learning/04_Ray_HPC/04_OfflineLearning.py", line 85, in <module>
Reinforcement.startTraining(disableOutput=disableOutput, resumeTraining=resumeTraining)
File "F:\02_Projekte\00_Reinforcement-Learning\04_Ray_HPC\libs\Reinforcement.py", line 111, in startTraining
self.result = self.agent.train()
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\trainer.py", line 529, in train
raise e
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\trainer.py", line 515, in train
result = Trainable.train(self)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\tune\trainable.py", line 226, in train
result = self.step()
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\trainer_template.py", line 148, in step
res = next(self.train_exec_impl)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 756, in __next__
return next(self.built_iterator)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 1075, in build_union
item = next(it)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 756, in __next__
return next(self.built_iterator)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
for item in it:
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 791, in apply_foreach
result = fn(item)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\execution\train_ops.py", line 69, in __call__
info = self.workers.local_worker().learn_on_batch(batch)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 855, in learn_on_batch
info_out[pid] = policy.learn_on_batch(batch)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
return func(self, *a, **k)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 334, in learn_on_batch
return self._learn_on_batch_eager(postprocessed_batch)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 71, in _func
return func(*args, **kwargs)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 79, in _func
out = func(*args, **kwargs)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 340, in _learn_on_batch_eager
grads_and_vars, stats = self._compute_gradients(samples)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
return func(self, *a, **k)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 613, in _compute_gradients
loss = loss_fn(self, self.model, self.dist_class, samples)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\sac\sac_tf_policy.py", line 331, in sac_actor_critic_loss
train_batch[SampleBatch.REWARDS] +
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1164, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1486, in _add_dispatch
return gen_math_ops.add_v2(x, y, name=name)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 471, in add_v2
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\framework\ops.py", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a double tensor but is a float tensor [Op:AddV2]
Process finished with exit code 1
my environment looks like this:
class environment(gym.Env):
metadata = {'render.modes': ['console']} # console
def __init__(self, env_config):
super(environment, self).__init__() # Define action and observation space
self.action_space = spaces.Box(low=-5.4, high=1.0, shape=(1,), dtype=np.float32)
self.observation_space = spaces.Box(low=0, high=255, shape=(8,), dtype=np.float32)
...
The same happens for the PPO-Agent
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Working With Offline Data — Ray 2.2.0 - the Ray documentation
RLlib has experimental support for reading/writing training samples from/to large offline datasets using Ray Dataset. We support JSON and Parquet files today.
Read more >[RLlib] Include 'info_dict' in offline data #23826 - GitHub
When using the output configuration key to log data during training, the infos_dict as returned from the environment is not written/recorded in ...
Read more >Rllib OfflineData preparation for SAC - python - Stack Overflow
And I want to use these when training SAC agents. Using the example saving_experiences to prepare my data gives me an error when...
Read more >RLlib work well in M1 mac? : r/reinforcementlearning - Reddit
u/BugsBugking avatar. BugsBugking • 3 mo. ago. Additional comment actions ... r/reinforcementlearning - CORL: Offline Reinforcement Learning Library.
Read more >Stable-Baselines3: Reliable Reinforcement Learning ...
Deep reinforcement learning (RL) research has grown rapidly in recent years, ... the scope to imitation (Wang et al., 2020) and offline learning...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@astronauti , would you be able to provide a short, self-sufficient reproduction script. I see you posted the data, so I won’t need that, just a quick script that shows this error. Then I can debug.
Thanks!
Changing:
to:
in the sac_tf_policy.py fixes the error. However the main problem is the import of the data from the .json file. Basically every value of train_batch has to be converted to tf.float32
Same problem happens with the PPO-Agent, however there are a lot more mismatches in the types.