PPO2 MlpLnLstm taking exponentially longer between updates
See original GitHub issueI am training a PPO2 agent with MlpLnLstm policy on samples with 7 columns (features) making a total of 7 million 32 bit floats; a relatively small dataset.
My hyperparams are
- n_steps: 1024,
- gamma: 0.999,
- learning_rate: 0.0005,
- ent_coef: 0.04,
- vf_coef: 0.6,
- cliprange: 0.25,
- noptepochs: 4,
- lam: 0.85,
- nminibatches: 1
(everything else (including network architecture) is left as default)
and the hardware is
- CPU: AMD Ryzen threadripper 12 core (24 CPU)
- GPU: EVGA (NVIDIA) 2070 RTX 2070 Super
- RAM: Corsair Vengeance 32 Gb (2 x 16Gb)
I am using 24 actors in parallel to utilise all 24 CPU’s when training.
I find that when training the model, there is massive overhead in between batch updates and this is increasing exponentially with every update; when n_updates was between 1-10 it was taking about 10 seconds between in each update, when n_updates was around 180 it was taking 21 minutes between updates, when n_updates is 205 it’s taking 88 minutes between updates (with the hyperparams and actors set above, we get a total of around 1050 updates). When the update is taking place, I see the GPU cranks up and makes the update very quickly (like 5 seconds). But in between updates I see that GPU utilization is at 0% while all the CPU usage oscillates like a sine wave between 20% and 80%.
I would like to better understand how the CPU and GPU are being utilized by stable-baselines.
Why is there so much CPU overhead between updates? My (custom) gym environment is very simple; the observations are directly taken from the raw data (a csv file stored in a pandas dataframe), and there is no transformation / calculation made on observations at each step. Also, the reward is very quick to calculate (only s on an array of elements, as I am using numpy (this array increases by 1 with every serial timestep). At first I thought that the untrained agent would “die” (done=True) a lot in the first few iterations, which would cause time between updates to be very quick. But even when the agent stays “alive” for more steps in later iterations, 88 minutes between updates seems far too long. How can each actor (each CPU) stepping through the environment for 1024 steps take so long?
Is there a some hidden “minimum loss decrease” parameter?" If the algorithm only updates when some minimum loss change between updates is achieved, then maybe that could be why it takes so long to update. Something analogous would be the Keras EarlyStopping min_delta
method. If this is the case, can we change this by changing some PPO2 parameter or kwarg?
Increase the architecture size. Issue #308 seems to suggest that increasing the network architecture would at least increase the GPU utilization at update time (which may also help convergence). I am open to trying this, but I don’t think it really answers my first question above.
My ultimate goal is just to have a fully trained agent that traversed the entire dataset of 1 million samples in a reasonable amount of time (even 2-3 days). Any changes I can make to my gym environment / PPO2 params or any explanation on how stable baselines utilises the hardware would be much appreciated. Thanks
Issue Analytics
- State:
- Created 4 years ago
- Comments:20
Top GitHub Comments
That behaviour does not sound normal at all, especially if you are using GPU with stable-baselines (it should utilize GPU in spikes for training and CPU only for environments).
Two things pop to my mind:
env = MyEnv(data=data)
, just to make sure this is not breaking things.DummyVecEnv
instead ofSubprocVecEnv
. Since your environment is computationally very simple, using different Python processes (SubprocVecEnv
) adds considerable overhead. See note here for more info.Hmm on a glimpse this environment seems alright and should work fine. Have you tried using “MlpLstm” policy rather than the “MlpLnLstm” policy? This one has worked as expect for me in my experiments. Other than that I do not have other suggestions to give other than starting to debug timings and try to pin down what takes long 😕