Outputs from runs with same random seed are not identical
See original GitHub issueDescription of the bug
I have been unable to get reproducible results when using the same seed for the random number generators.
Code example
Starting from the example described at
https://stable-baselines.readthedocs.io/en/master/modules/ppo1.html
I can create
import gym
from stable_baselines.common.policies import MlpPolicy, MlpLstmPolicy, MlpLnLstmPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO1
env = gym.make('CartPole-v1')
env = DummyVecEnv([lambda: env])
model = PPO1(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=5000,seed=100)
model.save("ppo1_cartpole")
Note that I have added seed=100
to model.learn()
.
Running this example prints output to the screen and writes the ppo1_cartpole.pkl file.
Running the exact same code twice (with the same seed value) produces different screen outputs and different ppo1_cartpole.pkl files.
System Info My environment:
- Installed by pip into virtual environment.
- Stable Baselines version 2.3.0
- Python version 3.5.2 is installed in the virtual environment.
- Tensorflow version 1.12.0.
- OpenAI Gym version 0.10.9.
- My OS is Ubuntu-16.04.
- No GPUs.
Additional context
It appears from the code that when seed
is not None
in learn()
the
function set_global_seeds(seed)
is called. I can see that this
function initialises the following random number generators with the
specified seed:
def set_global_seeds(seed):
"""
set the seed for python random, tensorflow, numpy and gym spaces
:param seed: (int) the seed
"""
tf.set_random_seed(seed)
np.random.seed(seed)
random.seed(seed)
gym.spaces.prng.seed(seed)
Because of this I also tried including the code lines
from stable_baselines.common import set_global_seeds
set_global_seeds(100)
before the call to gym.make()
in the above example, but it did not help.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9
Top GitHub Comments
Hello, If you look at the roadmap and the milestones, it is planned for the next releases. However, there is no due date, we would appreciate contribution to help us finish it.
@crobarcro @pstansell I tweaked a bit the code and managed to get reproducible results for A2C, ACER, PPO1, PPO2 and TRPO (not working with ACKTR yet and the others I did not try) You can find details here on the
deterministic-fix
branch.