MADDPG with horizon
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04, but same error is on macOSX 10.14.06
- Ray installed from (source or binary): source
- Ray version: 0.8.0 dev5
- Python version: python 3.6.9
- Exact command to reproduce: I’m trying to use the MADDPG algorithm to train 180 agents, divided into 60 agents with a dpg policy and 120 with a maddpg one.
I’ve set the horizon at 1500, but I would like to use 4000 later on, while the batches are the following:
- sample_batch_size=100
- train_batch_size= 400
- learning_starts =2
Describe the problem
When the policy tries to sample observation from the batch I get the following error
Traceback (most recent call last):
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trial_runner.py", line 438, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/ray_trial_executor.py", line 351, in fetch_result
result = ray.get(trial_future[0])
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/worker.py", line 2121, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): [36mray_MADDPG:train()[39m (pid=30410, ip=100.81.9.4)
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 421, in train
raise e
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 407, in train
result = Trainable.train(self)
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trainable.py", line 176, in train
result = self._train()
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer_template.py", line 129, in _train
fetches = self.optimizer.step()
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 142, in step
self._optimize()
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 162, in _optimize
samples = self._replay()
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 205, in _replay
dones) = replay_buffer.sample_with_idxes(idxes)
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 81, in sample_with_idxes
return self._encode_sample(idxes)
File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 60, in _encode_sample
data = self._storage[i]
IndexError: list index out of range
I’ve tried to change the above mentioned parameter but the only one that seems to make a difference is the horizon, which (if set to <=15) does not trig the IndexError. Any idea on how to fix this?
Issue Analytics
- State:
- Created 4 years ago
- Comments:22 (9 by maintainers)
Top Results From Across the Web
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive ...
MADDPG learns the correct behavior in both cases: in CC the speaker learns to output the target landmark color to direct the listener,...
Read more >Can AI Learn to Cooperate? Multi Agent Deep Deterministic ...
Can AI Learn to Cooperate? Multi Agent Deep Deterministic Policy Gradients ( MADDPG ) in PyTorch.
Read more >Multi-Agent Deep Reinforcement Learning: Revisiting MADDPG
MADDPG combines the multi-agent actor-critic (MAAC) method with the DDPG ... where T is some finite time horizon less than the episode length....
Read more >Cooperative Multiagent Deep Deterministic Policy Gradient ...
Thus, this paper proposes a Cooperative MADDPG (CoMADDPG) for connected vehicles at ... where is the time horizon and is a discount factor....
Read more >Algorithms — Ray 2.2.0 - the Ray documentation
MADDPG. tf. Yes. Partial. Yes. Parameter Sharing. Depends on bootstrapped algorithm ... imagine_horizon – Imagination horizon for training Actor and Critic.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think you’re right and the env just removes some agents during the episode, I will fix it and update you as soon as possible
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you’d still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray’s public slack channel.
Thanks again for opening the issue!