ray tune error with multiagent policy graph
See original GitHub issueMy error happens in the run_experiments(), specifically
ray.tune.error.TuneError: ('Trials did not complete', [PPO_WaveAttenuationPOEnv-v0_0_lr=1e-05])
Closing connection to TraCI and stopping simulation.
My code is based on the mutliagent example “multiagent_stabilizing_the_ring” project. Bascially, I want to try multiple RL CAVs on the same ring road with a shared PPO policy. Please let me know if I understood something wrong. Different with that example, I set
env_name="WaveAttenuationPOEnv",
scenario="LoopScenario",
policy graph part was kept the same:
def gen_policy():
return (PPOPolicyGraph, obs_space, act_space, {})
# Setup PG with an ensemble of `num_policies` different policy graphs
policy_graphs = {'av': gen_policy()}
def policy_mapping_fn(_):
return 'av'
config.update({
'multiagent': {
'policy_graphs': policy_graphs,
'policy_mapping_fn': tune.function(policy_mapping_fn),
'policies_to_train': ['av']
}
})
Additionally, it’s still not clear for me that if the shared policy is defined on the single-agent states and actions or joint states and actions. Any help would be truly appreciated!!
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Tune doesn't work with multi agent env · Issue #3785 · ray- ...
The issue is that tune is trying to expand lambda functions to generate trial variants. To fix that, you can 'escape' the policy...
Read more >How To Customize Policies — Ray 2.2.0
Policy classes encapsulate the core numerical components of RL algorithms. This typically includes the policy model that determines actions to take, a ...
Read more >Policies — Ray 2.2.0
The Policy class contains functionality to compute actions for decision making in an environment, as well as computing loss(es) and gradients, updating a...
Read more >Examples — Ray 2.2.0
This blog post is a brief tutorial on multi-agent RL and its design in RLlib. ... This script offers a simple workflow for...
Read more >Environments — Ray 2.2.0
Here we plot just the throughput of RLlib policy evaluation from 1 to 128 CPUs. ... This API allows you to implement any...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That is correct; batches are collected from both RL vehicles and used for the training.
Ah, glad that helped. So, for each policy graph you can just construct your desired action space as is currently done in the action_space and observation_space methods in the MultiWaveAttenuationEnv. For example, if you want to control two accelerations at once, you might make the action space Box(low=min_accel, high=max_accel, shape=(2,)) which will tell the policy graph to have two values as the output of the neural network.