hyperparameter tuning of PPO2 with MlpLstmPolicy using Optuna
See original GitHub issueI’m trying to tune the hyperparameters of the PPO2 with MlpLstmPolicy. Below is my code
import gym, optuna
import tensorflow as tf
from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.vec_env import DummyVecEnv
def make_env():
def maker():
env = gym.make('CartPole-v1')
return env
return maker
def ppo2_params(trial):
n_steps = trial.suggest_categorical('n_steps', [16, 32, 64, 128, 256, 512, 1024, 2048])
gamma = trial.suggest_categorical('gamma', [0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999])
learning_rate = trial.suggest_loguniform('lr', 1e-5, 1.)
ent_coef = trial.suggest_loguniform('ent_coef', 0.00000001, 0.1)
cliprange = trial.suggest_categorical('cliprange', [0.1, 0.2, 0.3, 0.4])
noptepochs = trial.suggest_categorical('noptepochs', [1, 5, 10, 20, 30, 50])
lam = trial.suggest_categorical('lambda', [0.8, 0.9, 0.92, 0.95, 0.98, 0.99, 1.0])
return{
'n_steps': n_steps,
'gamma': gamma,
'learning_rate': learning_rate,
'ent_coef': ent_coef,
'cliprange': cliprange,
'noptepochs': noptepochs,
'lam': lam
}
def optimize_agent(trial):
n_training_envs = 2
model_params = ppo2_params(trial)
envs = DummyVecEnv([make_env() for _ in range(n_training_envs)])
lstm_model = PPO2('MlpLstmPolicy', envs, nminibatches=n_training_envs, **model_params)
lstm_model.learn(1e5)
mean_reward, _ = evaluate_policy(lstm_model, lstm_model.get_env(), n_eval_episodes=10)
return -1 * mean_reward
n_training_envs = 2
study = optuna.create_study()
study.optimize(optimize_agent, n_trials=100, n_jobs = n_training_envs)
And the following error pops out:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/study.py", line 410, in optimize
show_progress_bar=show_progress_bar,
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 105, in _optimize
f.result()
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 162, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 267, in _run_trial
raise func_err
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 216, in _run_trial
value_or_values = func(trial)
File "<stdin>", line 10, in optimize_agent
File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py", line 321, in learn
for update in range(1, n_updates + 1):
TypeError: 'float' object cannot be interpreted as an integer
It seems the error comes from PPO2. The error persists even when I change n_training_envs from 2 to just 1.
Can anyone help me? Thanks in advance. @araffin
Issue Analytics
- State:
- Created 2 years ago
- Comments:11
Top Results From Across the Web
Hyperparameter tuning using optuna for FinRL - Medium
Optuna is a hyperparameter tuning library that works across multiple frameworks. ... The basic components in Optuna consist of an objective, ...
Read more >Stable Baselines Documentation - Read the Docs
1.12.4 Hyperparameter Optimization. We use Optuna for optimizing the hyperparameters. Tune the hyperparameters for PPO2, using a random ...
Read more >Hyperparameter Tuning using Optuna - Analytics Vidhya
Optuna is a software framework for automating the optimization process of hyperparameter tuning. Lets understand how to use it.
Read more >How to use Optuna for custom environments #29 - GitHub
One can do this: python -m train.py --algo ppo2 --env ... How to auto tune hyperparameters in stable baselines? hill-a/stable-baselines#971.
Read more >Understanding openAI gym and Optuna hyperparameter ...
My question is, using multiprocessing in both the SubprocVecEnv and study.optimize methods, how can I be sure that the hyperparameter tuning is ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
my guess is that
n_batch
orn_timesteps
is a float which makesn_updates
a float and raises the error, you should probably dolstm_model.learn(int(1e5))
@Miffyli My env has to feed in some extra configuration files, so the zoo style
python train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \ --sampler tpe --pruner median
does not fit my job. That’s why I define the functions myself.