Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MADDPG with horizon

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04, but same error is on macOSX 10.14.06
Ray installed from (source or binary): source
Ray version: 0.8.0 dev5
Python version: python 3.6.9
Exact command to reproduce: I’m trying to use the MADDPG algorithm to train 180 agents, divided into 60 agents with a dpg policy and 120 with a maddpg one.

I’ve set the horizon at 1500, but I would like to use 4000 later on, while the batches are the following:

sample_batch_size=100
train_batch_size= 400
learning_starts =2

Describe the problem

When the policy tries to sample observation from the batch I get the following error

Traceback (most recent call last):
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trial_runner.py", line 438, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): [36mray_MADDPG:train()[39m (pid=30410, ip=100.81.9.4)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 421, in train
    raise e
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 407, in train
    result = Trainable.train(self)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trainable.py", line 176, in train
    result = self._train()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer_template.py", line 129, in _train
    fetches = self.optimizer.step()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 142, in step
    self._optimize()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 162, in _optimize
    samples = self._replay()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 205, in _replay
    dones) = replay_buffer.sample_with_idxes(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 81, in sample_with_idxes
    return self._encode_sample(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 60, in _encode_sample
    data = self._storage[i]
IndexError: list index out of range

I’ve tried to change the above mentioned parameter but the only one that seems to make a difference is the horizon, which (if set to <=15) does not trig the IndexError. Any idea on how to fix this?

Issue Analytics

State:
Created 4 years ago
Comments:22 (9 by maintainers)

Top GitHub Comments

1reaction

nicofirst1commented, Oct 15, 2019

I think you’re right and the env just removes some agents during the episode, I will fix it and update you as soon as possible

0reactions

stale[bot]commented, Nov 28, 2020

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you’d still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray’s public slack channel.

Thanks again for opening the issue!

Top Results From Across the Web

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive ...

MADDPG learns the correct behavior in both cases: in CC the speaker learns to output the target landmark color to direct the listener,...

Can AI Learn to Cooperate? Multi Agent Deep Deterministic ...

Can AI Learn to Cooperate? Multi Agent Deep Deterministic Policy Gradients ( MADDPG ) in PyTorch.

Multi-Agent Deep Reinforcement Learning: Revisiting MADDPG

MADDPG combines the multi-agent actor-critic (MAAC) method with the DDPG ... where T is some finite time horizon less than the episode length....

Cooperative Multiagent Deep Deterministic Policy Gradient ...

Thus, this paper proposes a Cooperative MADDPG (CoMADDPG) for connected vehicles at ... where is the time horizon and is a discount factor....

Algorithms — Ray 2.2.0 - the Ray documentation

MADDPG. tf. Yes. Partial. Yes. Parameter Sharing. Depends on bootstrapped algorithm ... imagine_horizon – Imagination horizon for training Actor and Critic.