Assertion Error on Seq lens for PPO with Attention only in evaluation.
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Ray Component
RLlib
What happened + What you expected to happen
I trained a PPO model with attention, the full config is this:
{
"env": "SimpleCryptoEnv", # "CartPole-v0", #
# "env_config": config_train, # The dictionary we built before
"log_level": "WARNING",
"framework": "torch",
"_fake_gpus": False,
"callbacks": MyCallback,
"ignore_worker_failures": True,
"num_workers": 12, # One worker per agent. You can increase this but it will run fewer parallel trainings.
"num_envs_per_worker": 1,
"num_gpus": 1, # I yet have to understand if using a GPU is worth it, for our purposes, but I think it's not. This way you can train on a non-gpu enabled system.
"clip_rewards": True,
# "lr": 1e-4, # Hyperparameter grid search defined above
# "gamma": 0.99, # This can have a big impact on the result and needs to be properly tuned (range is 0 to 1)
# "lambda": 1.0,
"observation_filter": "MeanStdFilter",
"model": {
"fcnet_hiddens": [256, 256], # Hyperparameter grid search defined above
"use_attention": True,
"attention_use_n_prev_actions": 64,
"attention_use_n_prev_rewards": 64,
"vf_share_layers": True,
},
#"num_sgd_iter": 10, # tune.choice([10, 20, 30]),
"sgd_minibatch_size": 1024, # 128 # tune.choice([128, 512, 2048]),
"train_batch_size": 32768, # , # 1024 # tune.choice([10000, 20000, 40000]),
"evaluation_interval": 1, # Run evaluation on every iteration
"vf_clip_param": 300000,
"evaluation_config": {
"env_config": config_eval, # The dictionary we built before (only the overriding keys to use in evaluation)
"explore": False, # We don't want to explore during evaluation. All actions have to be repeatable.
},
}
It trains properly, but when I try to use it for evaluation, everything crashes:
agent.compute_single_action(input_dict={"obs":obs, "state":[0]*64, "prev_action": 0, "prev_reward": 0})
I’ve tried only with observations (I think that’s the proper way, it makes no sense that the developer has to discover how the agent wants the previous - inexistent- states; if this is dealt with automatically in training, there should be no problem for dealing with this in evaluation), but it doesn’t work that way either.
This is the full trace of the error.
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_348/535668899.py in <module>
----> 1 agent.compute_single_action(input_dict={"obs":obs, "state":[0]*64, "prev_action": 0, "prev_reward": 0})
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\trainer.py in compute_single_action(self, observation, state, prev_action, prev_reward, info, input_dict, policy_id, full_fetch, explore, timestep, episode, unsquash_action, clip_action, unsquash_actions, clip_actions, **kwargs)
1483 if input_dict is not None:
1484 input_dict[SampleBatch.OBS] = observation
-> 1485 action, state, extra = policy.compute_single_action(
1486 input_dict=input_dict,
1487 explore=explore,
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\policy.py in compute_single_action(self, obs, state, prev_action, prev_reward, info, input_dict, episode, explore, timestep, **kwargs)
216 episodes = [episode]
217
--> 218 out = self.compute_actions_from_input_dict(
219 input_dict=SampleBatch(input_dict),
220 episodes=episodes,
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\torch_policy.py in compute_actions_from_input_dict(self, input_dict, explore, timestep, **kwargs)
292 if state_batches else None
293
--> 294 return self._compute_action_helper(input_dict, state_batches,
295 seq_lens, explore, timestep)
296
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\utils\threading.py in wrapper(self, *a, **k)
19 try:
20 with self._lock:
---> 21 return func(self, *a, **k)
22 except AttributeError as e:
23 if "has no attribute '_lock'" in e.args[0]:
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\torch_policy.py in _compute_action_helper(self, input_dict, state_batches, seq_lens, explore, timestep)
932 else:
933 dist_class = self.dist_class
--> 934 dist_inputs, state_out = self.model(input_dict, state_batches,
935 seq_lens)
936
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\modelv2.py in __call__(self, input_dict, state, seq_lens)
241
242 with self.context():
--> 243 res = self.forward(restored, state or [], seq_lens)
244
245 if isinstance(input_dict, SampleBatch):
~\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\torch\attention_net.py in forward(self, input_dict, state, seq_lens)
345 state: List[TensorType],
346 seq_lens: TensorType) -> (TensorType, List[TensorType]):
--> 347 assert seq_lens is not None
348 # Push obs through "unwrapped" net's `forward()` first.
349 wrapped_out, _ = self._wrapped_forward(input_dict, [], None)
AssertionError:
Versions / Dependencies
ray ‘2.0.0.dev0’ python 3.8 Windows 10
Reproduction script
import ray
from ray.rllib.agents import ppo
from ray.tune.registry import register_env
from ray.rllib.agents.ppo import DEFAULT_CONFIG
import gym
config = DEFAULT_CONFIG.copy()
config.update(
{
"env": "CartPole-v0", #
# "env_config": config_train, # The dictionary we built before
"log_level": "WARNING",
"framework": "torch",
"_fake_gpus": False,
"callbacks": MyCallback,
"ignore_worker_failures": True,
"num_workers": 12, # One worker per agent. You can increase this but it will run fewer parallel trainings.
"num_envs_per_worker": 1,
"num_gpus": 1, # I yet have to understand if using a GPU is worth it, for our purposes, but I think it's not. This way you can train on a non-gpu enabled system.
"clip_rewards": True,
# "lr": 1e-4, # Hyperparameter grid search defined above
# "gamma": 0.99, # This can have a big impact on the result and needs to be properly tuned (range is 0 to 1)
# "lambda": 1.0,
"observation_filter": "MeanStdFilter",
"model": {
"fcnet_hiddens": [256, 256], # Hyperparameter grid search defined above
"use_attention": True,
"attention_use_n_prev_actions": 64,
"attention_use_n_prev_rewards": 64,
"vf_share_layers": True,
},
#"num_sgd_iter": 10, # tune.choice([10, 20, 30]),
"sgd_minibatch_size": 1024, # 128 # tune.choice([128, 512, 2048]),
"train_batch_size": 32768, # , # 1024 # tune.choice([10000, 20000, 40000]),
"evaluation_interval": 1, # Run evaluation on every iteration
"vf_clip_param": 300000,
}
)
ray.init(num_gpus=1)
agent = ppo.PPOTrainer(config=config, env="CartPole-v0")
env = gym.make("CartPole-v0")
episode_reward = 0
done = False
obs = env.reset()
agent.compute_single_action(input_dict={"obs":obs, "state":[0]*64, "prev_action": 0, "prev_reward": 0})
# or
agent.compute_single_action(obs)
Anything else
@deanwampler @ericl @richardliaw
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Untitled
Facultative anaerobic organisms, Hahnscheiben-sortiment, Child assessment ... Good mens dress shoes, Lqclm tic toc, Sigma 150-500mm lens canon mount, ...
Read more >Combined Evidence of Coverage and Disclosure Form
This booklet, called the “Combined Evidence of Coverage and Disclosure Form”, gives you important information about your health plan. This booklet must be ......
Read more >PPO (BENEFIT PLAN 009) SOUTHERN CALIFORNIA ...
subject to review only if a court of proper jurisdiction determines its action is arbitrary or capricious or otherwise a clear abuse of...
Read more >Supplement to Original Medicare Plan
Benefits under the PERS Choice Supplemental Plan are provided ONLY for services ... Your Plan includes a 24-hour nurse assessment service to help...
Read more >Everything you need to know about your health plan
You have a Personal Choice® PPO health plan, which means you have the freedom to see any in- or ... the absence of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Getting this error as well for
"use_lstm": True
on ray1.11.0
. I’ve verified that using the default fully-connected model works fineFor the LSTM, you have to pass the state parameter with
full_fetch=True
on rllib 2.0.This works fine for me for a LSTM module, but what’s troubling is getting an attention module working. I will update if I can find a solution to that.