question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

hyperparameter tuning of PPO2 with MlpLstmPolicy using Optuna

See original GitHub issue

I’m trying to tune the hyperparameters of the PPO2 with MlpLstmPolicy. Below is my code

import gym, optuna
import tensorflow as tf
from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.vec_env import DummyVecEnv

def make_env():
    def maker():
        env = gym.make('CartPole-v1')
        return env
    return maker

def ppo2_params(trial):
    n_steps = trial.suggest_categorical('n_steps', [16, 32, 64, 128, 256, 512, 1024, 2048])
    gamma = trial.suggest_categorical('gamma', [0.9, 0.95, 0.98, 0.99, 0.995, 0.999, 0.9999])
    learning_rate = trial.suggest_loguniform('lr', 1e-5, 1.)
    ent_coef = trial.suggest_loguniform('ent_coef', 0.00000001, 0.1)
    cliprange = trial.suggest_categorical('cliprange', [0.1, 0.2, 0.3, 0.4])
    noptepochs = trial.suggest_categorical('noptepochs', [1, 5, 10, 20, 30, 50])
    lam = trial.suggest_categorical('lambda', [0.8, 0.9, 0.92, 0.95, 0.98, 0.99, 1.0])
    return{
        'n_steps': n_steps,
        'gamma': gamma,
        'learning_rate': learning_rate,
        'ent_coef': ent_coef,
        'cliprange': cliprange,
        'noptepochs': noptepochs,
        'lam': lam
    }

def optimize_agent(trial):
    n_training_envs = 2
    model_params = ppo2_params(trial)
    envs = DummyVecEnv([make_env() for _ in range(n_training_envs)])
    lstm_model = PPO2('MlpLstmPolicy', envs, nminibatches=n_training_envs, **model_params)
    lstm_model.learn(1e5)
    mean_reward, _ = evaluate_policy(lstm_model, lstm_model.get_env(), n_eval_episodes=10)
    return -1 * mean_reward

n_training_envs = 2
study = optuna.create_study()
study.optimize(optimize_agent, n_trials=100, n_jobs = n_training_envs)

And the following error pops out:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/study.py", line 410, in optimize
    show_progress_bar=show_progress_bar,
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 105, in _optimize
    f.result()
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 162, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 267, in _run_trial
    raise func_err
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/optuna/_optimize.py", line 216, in _run_trial
    value_or_values = func(trial)
  File "<stdin>", line 10, in optimize_agent
  File "/home/lzheng/anaconda3/envs/RLPS/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py", line 321, in learn
    for update in range(1, n_updates + 1):
TypeError: 'float' object cannot be interpreted as an integer

It seems the error comes from PPO2. The error persists even when I change n_training_envs from 2 to just 1.
Can anyone help me? Thanks in advance. @araffin

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11

github_iconTop GitHub Comments

1reaction
araffincommented, Jul 5, 2021

my guess is that n_batch or n_timesteps is a float which makes n_updates a float and raises the error, you should probably do lstm_model.learn(int(1e5))

1reaction
LeZhengThucommented, Jul 5, 2021

@Miffyli My env has to feed in some extra configuration files, so the zoo style python train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \ --sampler tpe --pruner median does not fit my job. That’s why I define the functions myself.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Hyperparameter tuning using optuna for FinRL - Medium
Optuna is a hyperparameter tuning library that works across multiple frameworks. ... The basic components in Optuna consist of an objective, ...
Read more >
Stable Baselines Documentation - Read the Docs
1.12.4 Hyperparameter Optimization. We use Optuna for optimizing the hyperparameters. Tune the hyperparameters for PPO2, using a random ...
Read more >
Hyperparameter Tuning using Optuna - Analytics Vidhya
Optuna is a software framework for automating the optimization process of hyperparameter tuning. Lets understand how to use it.
Read more >
How to use Optuna for custom environments #29 - GitHub
One can do this: python -m train.py --algo ppo2 --env ... How to auto tune hyperparameters in stable baselines? hill-a/stable-baselines#971.
Read more >
Understanding openAI gym and Optuna hyperparameter ...
My question is, using multiprocessing in both the SubprocVecEnv and study.optimize methods, how can I be sure that the hyperparameter tuning is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found