[rllib] Nearly no parallelization while Training PPOAgent
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
- Ray installed from (source or binary): binary
- Ray version: 0.6.2
- Python version: 3.6.7
- Exact command to reproduce:
Describe the problem
I am training an PPOAgent
with a custom single agent environment on a Kubernetes cluster on AWS with one head node and 3 worker nodes. Each of them has 3500 mCPU requested and is limited to that. When I start the training usually only the head node seems to use more than one CPU. The 3 worker nodes are using a maximum of 1 CPU. iIs it a property of PPO or is this a bug? Or did I just miss a thing?
Source code / logs
import ray
import ray.rllib.agents.ppo as ppo
from ray.tune.logger import pretty_print
from ray.tune.registry import register_env
def env_creator(env_config):
import gym
import simple_beer_game
env = gym.make('SimpleBeerGame-v1')
return env
ray.init(redis_address='localhost:6379') #this code is executed on the head node, where redis is running
config = ppo.DEFAULT_CONFIG.copy()
config['env_config'] = {}
config['gamma'] = 0.9
config['model']['conv_filters'] = None
config['model']['fcnet_activation'] = 'relu'
config['num_workers'] = 1
config['model']['fcnet_hiddens'] = [50, 100, 100]
register_env("SimpleBeerGame", env_creator)
agent = ppo.PPOAgent(config=config, env="SimpleBeerGame")
for i in range(100):
result = agent.train()
Ray is started with this command on the head node:
ray start --block --head --no-ui --redis-port "${REDIS_PORT}" --object-manager-port "${OBJECT_MANAGER_PORT}" --node-manager-port "${NODE_MANAGER_PORT}"
and this command on the worker nodes:
ray start --block --redis-address "${RAY_HEAD_SVC}":"${REDIS_PORT}" --object-manager-port "${OBJECT_MANAGER_PORT}" --node-manager-port "${NODE_MANAGER_PORT}"
This shows the CPU usage while Training
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:18 (10 by maintainers)
Top Results From Across the Web
PPO trainer eating up memory - RLlib - Ray
Hi there, I'm trying to train a PPO agent via self play in my multi-agent env. At the moment it can manage about...
Read more >Getting Started with RLlib — Ray 2.2.0 - the Ray documentation
For more advanced evaluation functionality, refer to Customized Evaluation During Training. Note. Each algorithm has specific hyperparameters that can be set ...
Read more >Algorithms — Ray 2.2.0 - the Ray documentation
Sets the training related configuration. Parameters. beta – Scaling of advantages in exponential terms. When beta is 0.0, MARWIL is reduced to behavior...
Read more >Algorithms — Ray 1.11.0
[paper] [implementation] In IMPALA, a central learner runs SGD in a tight loop while asynchronously pulling sample batches from many actor processes. RLlib's...
Read more >Newest 'ray' Questions - Page 4 - Stack Overflow
I am trying to parallelize processes using Ray in a docker container. from ... worker died when training rllib agent with 3D shape...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I just tried this out on a cluster: rllib train --env=CartPole-v0 --run=PPO --config=‘{“num_workers”: 15, “train_batch_size”: 400000}’ --redis-address=localhost:6379
And saw
ray_PolicyEvaluator
processes using CPU on all nodes. However, there was a bias towards the head node until I increased sample_batch_size->400000. This probably just because the with smaller batch size most CPU is used by TensorFlow doing SGD, and that only happens on the head node.So perhaps you just need to increase train_batch_size? I think it’s inherent though that PPO will have some amount of extra CPU on the head node due to its use of synchronous optimization. You can also try out APPO which uses the IMPALA async strategy.
Ok! Thank you again!