[Enhancement] Multiple model iterations per Optuna trial and mean performance objectiveSee original GitHub issue
I currently have the problem that, a lot of times, the results Optuna optimization produces are not really too optimal, due to the stochastic nature of RL training. For example, training 3 agents with the same set of hyperparameters can result in 3 completely different learning curves (at least for the environment I’m training on). Might it make sense to implement the optimization code in way, such that for each trial multiple agents are trained, and the mean or median performance is reported to Optuna instead?
utils/exp_manager.py hyperparameter_optimization, line 713, I saw your comment “# TODO: eval each hyperparams several times to account for noisy evaluation”. Is that maybe exactly what you mention there?
I already had a look at the code and thought a little bit about how one might be able to do that. If somebody would be interested I could implement it and issue a pull request!
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Please do =)
Let’s open a draft PR and continue the discussion there.
If you open a PR, I would be happy to contribute.
Please do =)
By training multiple models simultaneously. Something like
I was afraid of that answer… yes it does work but not for image-based environment and requires beefy machine anyway (for instance for DQN on Atari, a single model may require 40GB of RAM).
We also need to check if the
model.learn(reset_num_timesteps=False) works well with schedules.
50 or so models simultaneously, without having memory problems or anything.
I would run only maximum 3-5 models simultaneously, unless the env is very simple and the network small.