question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MADDPG with horizon

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04, but same error is on macOSX 10.14.06
  • Ray installed from (source or binary): source
  • Ray version: 0.8.0 dev5
  • Python version: python 3.6.9
  • Exact command to reproduce: I’m trying to use the MADDPG algorithm to train 180 agents, divided into 60 agents with a dpg policy and 120 with a maddpg one.

I’ve set the horizon at 1500, but I would like to use 4000 later on, while the batches are the following:

  • sample_batch_size=100
  • train_batch_size= 400
  • learning_starts =2

Describe the problem

When the policy tries to sample observation from the batch I get the following error

Traceback (most recent call last):
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trial_runner.py", line 438, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): ray_MADDPG:train() (pid=30410, ip=100.81.9.4)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 421, in train
    raise e
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 407, in train
    result = Trainable.train(self)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trainable.py", line 176, in train
    result = self._train()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer_template.py", line 129, in _train
    fetches = self.optimizer.step()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 142, in step
    self._optimize()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 162, in _optimize
    samples = self._replay()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 205, in _replay
    dones) = replay_buffer.sample_with_idxes(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 81, in sample_with_idxes
    return self._encode_sample(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 60, in _encode_sample
    data = self._storage[i]
IndexError: list index out of range

I’ve tried to change the above mentioned parameter but the only one that seems to make a difference is the horizon, which (if set to <=15) does not trig the IndexError. Any idea on how to fix this?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:22 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
nicofirst1commented, Oct 15, 2019

I think you’re right and the env just removes some agents during the episode, I will fix it and update you as soon as possible

0reactions
stale[bot]commented, Nov 28, 2020

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you’d still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray’s public slack channel.

Thanks again for opening the issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive ...
MADDPG learns the correct behavior in both cases: in CC the speaker learns to output the target landmark color to direct the listener,...
Read more >
Can AI Learn to Cooperate? Multi Agent Deep Deterministic ...
Can AI Learn to Cooperate? Multi Agent Deep Deterministic Policy Gradients ( MADDPG ) in PyTorch.
Read more >
Multi-Agent Deep Reinforcement Learning: Revisiting MADDPG
MADDPG combines the multi-agent actor-critic (MAAC) method with the DDPG ... where T is some finite time horizon less than the episode length....
Read more >
Cooperative Multiagent Deep Deterministic Policy Gradient ...
Thus, this paper proposes a Cooperative MADDPG (CoMADDPG) for connected vehicles at ... where is the time horizon and is a discount factor....
Read more >
Algorithms — Ray 2.2.0 - the Ray documentation
MADDPG. tf. Yes. Partial. Yes. Parameter Sharing. Depends on bootstrapped algorithm ... imagine_horizon – Imagination horizon for training Actor and Critic.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found