Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Assertion Error on Seq lens for PPO with Attention only in evaluation.

See original GitHub issue

Search before asking

I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

I trained a PPO model with attention, the full config is this:

{
            "env": "SimpleCryptoEnv",  # "CartPole-v0", #
            # "env_config": config_train,  # The dictionary we built before
            "log_level": "WARNING",
            "framework": "torch",
            "_fake_gpus": False,
            "callbacks": MyCallback,
            "ignore_worker_failures": True,
            "num_workers": 12,  # One worker per agent. You can increase this but it will run fewer parallel trainings.
            "num_envs_per_worker": 1,
            "num_gpus": 1,  # I yet have to understand if using a GPU is worth it, for our purposes, but I think it's not. This way you can train on a non-gpu enabled system.
            "clip_rewards": True,
            # "lr": 1e-4,  # Hyperparameter grid search defined above
            # "gamma": 0.99,  # This can have a big impact on the result and needs to be properly tuned (range is 0 to 1)
            # "lambda": 1.0,
            "observation_filter": "MeanStdFilter",
            "model": {
                "fcnet_hiddens": [256, 256],  # Hyperparameter grid search defined above
                "use_attention": True,
                "attention_use_n_prev_actions": 64,
                "attention_use_n_prev_rewards": 64,
                "vf_share_layers": True,
            },
            #"num_sgd_iter": 10,  # tune.choice([10, 20, 30]),
            "sgd_minibatch_size": 1024, # 128  # tune.choice([128, 512, 2048]),
            "train_batch_size": 32768, # , # 1024 # tune.choice([10000, 20000, 40000]),
            "evaluation_interval": 1,  # Run evaluation on every iteration
            "vf_clip_param": 300000, 
            "evaluation_config": {
                "env_config": config_eval,  # The dictionary we built before (only the overriding keys to use in evaluation)
                "explore": False,  # We don't want to explore during evaluation. All actions have to be repeatable.
            },
        }

It trains properly, but when I try to use it for evaluation, everything crashes:

agent.compute_single_action(input_dict={"obs":obs, "state":[0]*64, "prev_action": 0, "prev_reward": 0})

I’ve tried only with observations (I think that’s the proper way, it makes no sense that the developer has to discover how the agent wants the previous - inexistent- states; if this is dealt with automatically in training, there should be no problem for dealing with this in evaluation), but it doesn’t work that way either.

This is the full trace of the error.

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_348/535668899.py in <module>
----> 1 agent.compute_single_action(input_dict={"obs":obs, "state":[0]*64, "prev_action": 0, "prev_reward": 0})

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\trainer.py in compute_single_action(self, observation, state, prev_action, prev_reward, info, input_dict, policy_id, full_fetch, explore, timestep, episode, unsquash_action, clip_action, unsquash_actions, clip_actions, **kwargs)
   1483         if input_dict is not None:
   1484             input_dict[SampleBatch.OBS] = observation
-> 1485             action, state, extra = policy.compute_single_action(
   1486                 input_dict=input_dict,
   1487                 explore=explore,

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\policy.py in compute_single_action(self, obs, state, prev_action, prev_reward, info, input_dict, episode, explore, timestep, **kwargs)
    216             episodes = [episode]
    217 
--> 218         out = self.compute_actions_from_input_dict(
    219             input_dict=SampleBatch(input_dict),
    220             episodes=episodes,

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\torch_policy.py in compute_actions_from_input_dict(self, input_dict, explore, timestep, **kwargs)
    292                 if state_batches else None
    293 
--> 294             return self._compute_action_helper(input_dict, state_batches,
    295                                                seq_lens, explore, timestep)
    296 

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\utils\threading.py in wrapper(self, *a, **k)
     19         try:
     20             with self._lock:
---> 21                 return func(self, *a, **k)
     22         except AttributeError as e:
     23             if "has no attribute '_lock'" in e.args[0]:

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\torch_policy.py in _compute_action_helper(self, input_dict, state_batches, seq_lens, explore, timestep)
    932             else:
    933                 dist_class = self.dist_class
--> 934                 dist_inputs, state_out = self.model(input_dict, state_batches,
    935                                                     seq_lens)
    936 

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\modelv2.py in __call__(self, input_dict, state, seq_lens)
    241 
    242         with self.context():
--> 243             res = self.forward(restored, state or [], seq_lens)
    244 
    245         if isinstance(input_dict, SampleBatch):

~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\torch\attention_net.py in forward(self, input_dict, state, seq_lens)
    345                 state: List[TensorType],
    346                 seq_lens: TensorType) -> (TensorType, List[TensorType]):
--> 347         assert seq_lens is not None
    348         # Push obs through "unwrapped" net's `forward()` first.
    349         wrapped_out, _ = self._wrapped_forward(input_dict, [], None)

AssertionError:

Versions / Dependencies

ray ‘2.0.0.dev0’ python 3.8 Windows 10

Reproduction script

import ray
from ray.rllib.agents import ppo
from ray.tune.registry import register_env
from ray.rllib.agents.ppo import DEFAULT_CONFIG
import gym
config = DEFAULT_CONFIG.copy()
config.update(
    {
            "env":  "CartPole-v0", #
            # "env_config": config_train,  # The dictionary we built before
            "log_level": "WARNING",
            "framework": "torch",
            "_fake_gpus": False,
            "callbacks": MyCallback,
            "ignore_worker_failures": True,
            "num_workers": 12,  # One worker per agent. You can increase this but it will run fewer parallel trainings.
            "num_envs_per_worker": 1,
            "num_gpus": 1,  # I yet have to understand if using a GPU is worth it, for our purposes, but I think it's not. This way you can train on a non-gpu enabled system.
            "clip_rewards": True,
            # "lr": 1e-4,  # Hyperparameter grid search defined above
            # "gamma": 0.99,  # This can have a big impact on the result and needs to be properly tuned (range is 0 to 1)
            # "lambda": 1.0,
            "observation_filter": "MeanStdFilter",
            "model": {
                "fcnet_hiddens": [256, 256],  # Hyperparameter grid search defined above
                "use_attention": True,
                "attention_use_n_prev_actions": 64,
                "attention_use_n_prev_rewards": 64,
                "vf_share_layers": True,
            },
            #"num_sgd_iter": 10,  # tune.choice([10, 20, 30]),
            "sgd_minibatch_size": 1024, # 128  # tune.choice([128, 512, 2048]),
            "train_batch_size": 32768, # , # 1024 # tune.choice([10000, 20000, 40000]),
            "evaluation_interval": 1,  # Run evaluation on every iteration
            "vf_clip_param": 300000, 
        }
)

ray.init(num_gpus=1)
agent = ppo.PPOTrainer(config=config, env="CartPole-v0")
env = gym.make("CartPole-v0")

episode_reward = 0
done = False
obs = env.reset()
agent.compute_single_action(input_dict={"obs":obs, "state":[0]*64, "prev_action": 0, "prev_reward": 0})
# or
agent.compute_single_action(obs)

Anything else

@deanwampler @ericl @richardliaw

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

Kylecrifcommented, Apr 4, 2022

Getting this error as well for "use_lstm": True on ray 1.11.0. I’ve verified that using the default fully-connected model works fine

0reactions

malinthacommented, Sep 19, 2022

For the LSTM, you have to pass the state parameter with full_fetch=True on rllib 2.0.

state=[np.zeros(params['model_config']['lstm_cell_size'], np.float32),
           np.zeros(params['model_config']['lstm_cell_size'], np.float32)] 
action = algo_agent.compute_single_action(observation=obs[agt], policy_id=policy_id, explore=False, full_fetch=True)

This works fine for me for a LSTM module, but what’s troubling is getting an attention module working. I will update if I can find a solution to that.

Top Results From Across the Web

Untitled

Facultative anaerobic organisms, Hahnscheiben-sortiment, Child assessment ... Good mens dress shoes, Lqclm tic toc, Sigma 150-500mm lens canon mount, ...

Combined Evidence of Coverage and Disclosure Form

This booklet, called the “Combined Evidence of Coverage and Disclosure Form”, gives you important information about your health plan. This booklet must be ......

PPO (BENEFIT PLAN 009) SOUTHERN CALIFORNIA ...

subject to review only if a court of proper jurisdiction determines its action is arbitrary or capricious or otherwise a clear abuse of...

Supplement to Original Medicare Plan

Benefits under the PERS Choice Supplemental Plan are provided ONLY for services ... Your Plan includes a 24-hour nurse assessment service to help...

Everything you need to know about your health plan

You have a Personal Choice® PPO health plan, which means you have the freedom to see any in- or ... the absence of...