Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use Optuna for custom environments

See original GitHub issue

This isn’t a bug or anything like that, but I wonder if anyone could point me in the right direction.

One can do this:

python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median

But when you’ve created a custom environment…

env=DummyVecEnv([lambda: RunEnv(...)])
model= A2C(CnnPolicy,env).learn(total_timesteps)

… how can I enter the Optuna parameters - or is it even possible?

Of course I can create a custom Gym environment, but that’s a bit clunky.

Thankful for feedback

Kind regards

Issue Analytics

State:
Created 4 years ago
Reactions:7
Comments:13 (8 by maintainers)

Top GitHub Comments

7reactions

josiahcoadcommented, Jun 19, 2020

I spent a while trying to get zoo to work with my custom env. It kept freezing during the training. Finally, I found this (non-zoo) simple approach. This worked for me with tf 1.15.0 and baselines 2.10.0

# hide all deprecation warnings from tensorflow
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import optuna
import gym
import numpy as np

from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.cmd_util import make_vec_env

# https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb
from custom_env import GoLeftEnv

def optimize_ppo2(trial):
    """ Learning hyperparamters we want to optimise"""
    return {
        'n_steps': int(trial.suggest_loguniform('n_steps', 16, 2048)),
        'gamma': trial.suggest_loguniform('gamma', 0.9, 0.9999),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1.),
        'ent_coef': trial.suggest_loguniform('ent_coef', 1e-8, 1e-1),
        'cliprange': trial.suggest_uniform('cliprange', 0.1, 0.4),
        'noptepochs': int(trial.suggest_loguniform('noptepochs', 1, 48)),
        'lam': trial.suggest_uniform('lam', 0.8, 1.)
    }


def optimize_agent(trial):
    """ Train the model and optimize
        Optuna maximises the negative log likelihood, so we
        need to negate the reward here
    """
    model_params = optimize_ppo2(trial)
    env = make_vec_env(lambda: GoLeftEnv(), n_envs=16, seed=0)
    model = PPO2('MlpPolicy', env, verbose=0, nminibatches=1, **model_params)
    model.learn(10000)
    mean_reward, _ = evaluate_policy(model, GoLeftEnv(), n_eval_episodes=10)

    return -1 * mean_reward


if __name__ == '__main__':
    study = optuna.create_study()
    try:
        study.optimize(optimize_agent, n_trials=100, n_jobs=4)
    except KeyboardInterrupt:
        print('Interrupted by keyboard.')

1reaction

araffincommented, Jun 20, 2020

Have you considered registering your env instead?

Cf doc: https://github.com/openai/gym/wiki/Environments