question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PPO2 MlpLnLstm taking exponentially longer between updates

See original GitHub issue

I am training a PPO2 agent with MlpLnLstm policy on samples with 7 columns (features) making a total of 7 million 32 bit floats; a relatively small dataset.

My hyperparams are

  • n_steps: 1024,
  • gamma: 0.999,
  • learning_rate: 0.0005,
  • ent_coef: 0.04,
  • vf_coef: 0.6,
  • cliprange: 0.25,
  • noptepochs: 4,
  • lam: 0.85,
  • nminibatches: 1

(everything else (including network architecture) is left as default)

and the hardware is

  • CPU: AMD Ryzen threadripper 12 core (24 CPU)
  • GPU: EVGA (NVIDIA) 2070 RTX 2070 Super
  • RAM: Corsair Vengeance 32 Gb (2 x 16Gb)

I am using 24 actors in parallel to utilise all 24 CPU’s when training.

I find that when training the model, there is massive overhead in between batch updates and this is increasing exponentially with every update; when n_updates was between 1-10 it was taking about 10 seconds between in each update, when n_updates was around 180 it was taking 21 minutes between updates, when n_updates is 205 it’s taking 88 minutes between updates (with the hyperparams and actors set above, we get a total of around 1050 updates). When the update is taking place, I see the GPU cranks up and makes the update very quickly (like 5 seconds). But in between updates I see that GPU utilization is at 0% while all the CPU usage oscillates like a sine wave between 20% and 80%.

I would like to better understand how the CPU and GPU are being utilized by stable-baselines.

Why is there so much CPU overhead between updates? My (custom) gym environment is very simple; the observations are directly taken from the raw data (a csv file stored in a pandas dataframe), and there is no transformation / calculation made on observations at each step. Also, the reward is very quick to calculate (only s on an array of elements, as I am using numpy (this array increases by 1 with every serial timestep). At first I thought that the untrained agent would “die” (done=True) a lot in the first few iterations, which would cause time between updates to be very quick. But even when the agent stays “alive” for more steps in later iterations, 88 minutes between updates seems far too long. How can each actor (each CPU) stepping through the environment for 1024 steps take so long?

Is there a some hidden “minimum loss decrease” parameter?" If the algorithm only updates when some minimum loss change between updates is achieved, then maybe that could be why it takes so long to update. Something analogous would be the Keras EarlyStopping min_delta method. If this is the case, can we change this by changing some PPO2 parameter or kwarg?

Increase the architecture size. Issue #308 seems to suggest that increasing the network architecture would at least increase the GPU utilization at update time (which may also help convergence). I am open to trying this, but I don’t think it really answers my first question above.

My ultimate goal is just to have a fully trained agent that traversed the entire dataset of 1 million samples in a reasonable amount of time (even 2-3 days). Any changes I can make to my gym environment / PPO2 params or any explanation on how stable baselines utilises the hardware would be much appreciated. Thanks

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:20

github_iconTop GitHub Comments

1reaction
Miffylicommented, Nov 18, 2019

That behaviour does not sound normal at all, especially if you are using GPU with stable-baselines (it should utilize GPU in spikes for training and CPU only for environments).

Two things pop to my mind:

  1. You seem to share the same data object with all workers. I am not too familiar with Pandas to know what this object is exactly, but it could end up being shared in a wonky way. I would move reading dataset just before env = MyEnv(data=data), just to make sure this is not breaking things.
  2. Try DummyVecEnv instead of SubprocVecEnv. Since your environment is computationally very simple, using different Python processes (SubprocVecEnv) adds considerable overhead. See note here for more info.
1reaction
Miffylicommented, Nov 14, 2019

Hmm on a glimpse this environment seems alright and should work fine. Have you tried using “MlpLstm” policy rather than the “MlpLnLstm” policy? This one has worked as expect for me in my experiments. Other than that I do not have other suggestions to give other than starting to debug timings and try to pin down what takes long 😕

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stable Baselines Documentation - Read the Docs
Note: Stable-Baselines supports Tensorflow versions from 1.8.0 to 1.15.0, and does not work on ... Take a look at PPO2, TRPO or A2C....
Read more >
Stable Baselines Documentation
Install Stable-Baselines from source, inside the folder, run pip install -e . ... Here is a quick example of how to train and...
Read more >
Understanding policy update in PPO2
I have a question regarding the functionality of the PPO2 algorithm together with the Stable Baselines implementation: From the original ...
Read more >
stable-baselines - يستغرق PPO2 MlpLnLstm وقتًا أطول بشكل كبير ...
أقوم بتدريب وكيل PPO2 على سياسة MlpLnLstm عينات من 7 أعمدة (ميزات) مما يجعل ... reward = arr[-1] # take last value return...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found