Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

hyperparameter tuning of PPO2 with MlpLstmPolicy using Optuna

See original GitHub issue

I’m trying to tune the hyperparameters of the PPO2 with MlpLstmPolicy. Below is my code

import gym, optuna
import tensorflow as tf
from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.vec_env import DummyVecEnv

def make_env():
    def maker():
        env = gym.make('CartPole-v1')
        return env
    return maker

def ppo2_params(trial):
    n_steps = trial.suggest_categorical('n_steps', [16, 32, 64, 128, 256, 512, 1024, 2048])
    gamma = trial.suggest_categorical('gamma', [0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999])
    learning_rate = trial.suggest_loguniform('lr', 1e-5, 1.)
    ent_coef = trial.suggest_loguniform('ent_coef', 0.00000001, 0.1)
    cliprange = trial.suggest_categorical('cliprange', [0.1, 0.2, 0.3, 0.4])
    noptepochs = trial.suggest_categorical('noptepochs', [1, 5, 10, 20, 30, 50])
    lam = trial.suggest_categorical('lambda', [0.8, 0.9, 0.92, 0.95, 0.98, 0.99, 1.0])
    return{
        'n_steps': n_steps,
        'gamma': gamma,
        'learning_rate': learning_rate,
        'ent_coef': ent_coef,
        'cliprange': cliprange,
        'noptepochs': noptepochs,
        'lam': lam
    }

def optimize_agent(trial):
    n_training_envs = 2
    model_params = ppo2_params(trial)
    envs = DummyVecEnv([make_env() for _ in range(n_training_envs)])
    lstm_model = PPO2('MlpLstmPolicy', envs, nminibatches=n_training_envs, **model_params)
    lstm_model.learn(1e5)
    mean_reward, _ = evaluate_policy(lstm_model, lstm_model.get_env(), n_eval_episodes=10)
    return -1 * mean_reward

n_training_envs = 2
study = optuna.create_study()
study.optimize(optimize_agent, n_trials=100, n_jobs = n_training_envs)

And the following error pops out:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/study.py", line 410, in optimize
    show_progress_bar=show_progress_bar,
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 105, in _optimize
    f.result()
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 162, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 267, in _run_trial
    raise func_err
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 216, in _run_trial
    value_or_values = func(trial)
  File "<stdin>", line 10, in optimize_agent
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py", line 321, in learn
    for update in range(1, n_updates + 1):
TypeError: 'float' object cannot be interpreted as an integer

It seems the error comes from PPO2. The error persists even when I change n_training_envs from 2 to just 1.
Can anyone help me? Thanks in advance. @araffin

Issue Analytics

State:
Created 2 years ago
Comments:11

Top GitHub Comments

1reaction

araffincommented, Jul 5, 2021

my guess is that n_batch or n_timesteps is a float which makes n_updates a float and raises the error, you should probably do lstm_model.learn(int(1e5))

1reaction

LeZhengThucommented, Jul 5, 2021

@Miffyli My env has to feed in some extra configuration files, so the zoo style python train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \ --sampler tpe --pruner median does not fit my job. That’s why I define the functions myself.