PPO2 with MlpLstmPolicy crashes GPU
See original GitHub issueDescribe the bug
While training using PPO2 with MlpLstmPolicy on custom env, my computer intermittently freezes yet continues training. When I attempt to monitor GPU’s with watch -n0.5 nvidia-smi
it loads first gpu data then seems to hang for a while until I see that my second GPU has an error. Even after training, anything that uses a gpu glitches requiring me to reset the computer even to run another model. I’ve run the same training on the same env using just MlpPolicy and it trains just fine (although my problem needs a recurrent network so I always get bad results) and I can monitor everything without GPU glitches. I thought it might be memory overload but I don’t get anywhere near using all the ram or GPU memory.
Code example
import gym
import money_maker
import os
from stable_baselines.common.policies import MlpLstmPolicy, LstmPolicy
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines import PPO2
# multiprocess environment
n_cpu = 128
env = SubprocVecEnv([lambda: gym.make('maker-v0') for i in range(n_cpu)])
model = PPO2(MlpLstmPolicy, env, verbose=1, nminibatches=32,
tensorboard_log="./ppo2_lstm_21_jan_morn_tensorboard/")
model.learn(total_timesteps=10000000)
model.save("ppo2_maker_lstm")
del model
env.close()
System Info Describe the characteristic of your environment:
- installed using PIP according to docs
- 2- gtx-1080ti GPU’s, driver:410.93,
- python version 3.6.5 in conda environment
- Tensorflow version: 1.12.0
Additional context
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (1 by maintainers)
Top GitHub Comments
Awesome!!! No more crashing and logs are about 100x smaller. Good job guys!
Hey,
The logging parameters are usually found in
def setup_model(self):
in the model file (ex:stable-baselines/ppo2/ppo2.py
), and look liketf.summary.[type]([name], [value])
.If you comment everything except the scalars that should reduce by an order of magnitude the logging size.