Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] Offline Learning Bug with SAC

See original GitHub issue

I am getting the following error:

Traceback (most recent call last):
  File "F:/02_Projekte/00_Reinforcement-Learning/04_Ray_HPC/04_OfflineLearning.py", line 85, in <module>
    Reinforcement.startTraining(disableOutput=disableOutput, resumeTraining=resumeTraining)
  File "F:\02_Projekte\00_Reinforcement-Learning\04_Ray_HPC\libs\Reinforcement.py", line 111, in startTraining
    self.result = self.agent.train()
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\trainer.py", line 529, in train
    raise e
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\trainer.py", line 515, in train
    result = Trainable.train(self)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\tune\trainable.py", line 226, in train
    result = self.step()
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\trainer_template.py", line 148, in step
    res = next(self.train_exec_impl)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 843, in apply_filter
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 1075, in build_union
    item = next(it)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 783, in apply_foreach
    for item in it:
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\util\iter.py", line 791, in apply_foreach
    result = fn(item)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\execution\train_ops.py", line 69, in __call__
    info = self.workers.local_worker().learn_on_batch(batch)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 855, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
    return func(self, *a, **k)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 334, in learn_on_batch
    return self._learn_on_batch_eager(postprocessed_batch)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 71, in _func
    return func(*args, **kwargs)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 79, in _func
    out = func(*args, **kwargs)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 340, in _learn_on_batch_eager
    grads_and_vars, stats = self._compute_gradients(samples)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
    return func(self, *a, **k)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\policy\eager_tf_policy.py", line 613, in _compute_gradients
    loss = loss_fn(self, self.model, self.dist_class, samples)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\ray\rllib\agents\sac\sac_tf_policy.py", line 331, in sac_actor_critic_loss
    train_batch[SampleBatch.REWARDS] +
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1164, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1486, in _add_dispatch
    return gen_math_ops.add_v2(x, y, name=name)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 471, in add_v2
    _ops.raise_from_not_ok_status(e, name)
  File "C:\Users\.conda\envs\ReinfLearnNew\lib\site-packages\tensorflow\python\framework\ops.py", line 6862, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a double tensor but is a float tensor [Op:AddV2]

Process finished with exit code 1

my environment looks like this:

class environment(gym.Env):
    metadata = {'render.modes': ['console']}  # console

    def __init__(self, env_config):
        super(environment, self).__init__()  # Define action and observation space
        self.action_space = spaces.Box(low=-5.4, high=1.0, shape=(1,), dtype=np.float32)
        self.observation_space = spaces.Box(low=0, high=255, shape=(8,), dtype=np.float32)
...

The same happens for the PPO-Agent

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

sven1977commented, Mar 18, 2021

@astronauti , would you be able to provide a short, self-sufficient reproduction script. I see you posted the data, so I won’t need that, just a quick script that shows this error. Then I can debug.

Thanks!

0reactions

SebastianBodzacommented, Mar 18, 2021

Changing:

    q_t_selected_target = tf.stop_gradient(
        train_batch[SampleBatch.REWARDS],  +
        policy.config["gamma"]*policy.config["n_step"] * q_tp1_best_masked)

to:


    q_t_selected_target = tf.stop_gradient(
        tf.cast(train_batch[SampleBatch.REWARDS], tf.float32) +
        policy.config["gamma"]*policy.config["n_step"] * q_tp1_best_masked)

in the sac_tf_policy.py fixes the error. However the main problem is the import of the data from the .json file. Basically every value of train_batch has to be converted to tf.float32

Same problem happens with the PPO-Agent, however there are a lot more mismatches in the types.

Top Results From Across the Web

Working With Offline Data — Ray 2.2.0 - the Ray documentation

RLlib has experimental support for reading/writing training samples from/to large offline datasets using Ray Dataset. We support JSON and Parquet files today.

[RLlib] Include 'info_dict' in offline data #23826 - GitHub

When using the output configuration key to log data during training, the infos_dict as returned from the environment is not written/recorded in ...

Rllib OfflineData preparation for SAC - python - Stack Overflow

And I want to use these when training SAC agents. Using the example saving_experiences to prepare my data gives me an error when...

RLlib work well in M1 mac? : r/reinforcementlearning - Reddit

u/BugsBugking avatar. BugsBugking • 3 mo. ago. Additional comment actions ... r/reinforcementlearning - CORL: Offline Reinforcement Learning Library.

Stable-Baselines3: Reliable Reinforcement Learning ...

Deep reinforcement learning (RL) research has grown rapidly in recent years, ... the scope to imitation (Wang et al., 2020) and offline learning...