question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi-Agent Env Support?

See original GitHub issue

Hello, does the lib support multi-agent environment? Or more precisely, allow multiple agents share environment state, select their action in parallel, then return the combined actions to the environment? -----------------Edit----------------- After multiple tries, I figure out some tips for training with Multi-Agent environment.

How to pass multi-agent observation

If there is only one model for all agent, simply pack all observation into one array, and pretend it as a single mega agent environment. If there are multiple models, follow same procedure, and also devise algo and agent part of your code. It’s recommended to use torch.nn.ModuleList or torch.nn.ModuleDict to organize multiple models, then apply function in parallel to each model.

How to pass multiple reward values

A typical Gym environment step return should be a four elements tuple: observation, reward, done, info. The reward in the return of step must be a scalar because evaluation need it to calculate total episode reward. However, sometimes you may want to have a unique reward for each agent, which must be an 1d array. The key point of solution is to passing your actual reward from another output other than reward. To resolve this problem, modify the enviroment with:

      actual_reward = self.reward()  # Get reward at this step, actual_reward is an array or nested array
      reward = sum(actual_reward)  # reward is scalar.
      # Or take a mean of it if you need, but remember, the Return in evaluation is the cumulative reward of whole episode/trajectory.
      info = OrderedDict(
          reward=actual_reward,  # This reward is an array or nested array
      )
      return observation, reward, done, info

Then, in algo part of your code, modify initialize_replay_buffer, samples_to_buffer and any functions that relevant to the conversion between samples and buffer:

SamplesToBuffer = namedarraytuple("SamplesToBuffer", ["observation", "action", "reward", "done"])

        buffer = SamplesToBuffer(
            observation=observation,
            action=samples.agent.action,
            reward=samples.env.reward,  # Change this line to reward=samples.env.env_info.reward,
            done=samples.env.done,
        )

        example_to_buffer = SamplesToBuffer(
            observation=examples["observation"],
            action=examples["action"],
            reward=examples["reward"],  # Change this line to reward=examples['env_info'].reward,
            done=examples["done"],
        )

After that, your algo would receive actual_reward (which is an array or nested array) instead of scalar reward.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:30 (30 by maintainers)

github_iconTop GitHub Comments

2reactions
wangwwno1commented, Oct 26, 2019

: ) Hello astooke, the training is finished and is a great💯 success.
Many thanks to you and this awesome libs! 🎉 Since the original problem is resolved, I will close this issue, and make a summary about how to handle the output of Multi-Agent environment in the first post. Once the paper is finished, I would like to contribute a citation to the whitepaper of this great lib. Have a good day, and Happy Reinforcement Learning!

2reactions
astookecommented, Sep 12, 2019

Hi! The library does not currently support multi-agent environment interactions directly. Although I hope it is the sort of thing this could be extended to. One way to do it would be to write the multiple agents into one agent, and then use a Composite action space to collect all of their actions and pass them into the environment. The algorithm would then have access to the multiple agents, as well. Could be a fairly quick thing, or there might be some hidden difficulties…let us know if you try and what happens! Happy to answer more questions or help along the way.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Environments — Ray 2.2.0
In a multi-agent environment, there are more than one “agent” acting simultaneously, in a turn-based fashion, or in a combination of these two....
Read more >
Blog - Multi-Agent Learning Environments
At the end of this post, we also mention some general frameworks which support a variety of environments and game modes. Env, Type,...
Read more >
The Multi-Agent setting — highway-env documentation
The Multi-Agent setting¶. Most environments can be configured to a multi-agent version. Here is how: Increase the number of controlled vehicles¶.
Read more >
An Open Source Tool for Scaling Multi-Agent Reinforcement ...
We just rolled out general support for multi-agent reinforcement ... all actions in the environment, multi-agent approaches can offer:.
Read more >
Multi-agent Environment for Decision-Support ... - Springer Link
This paper presents a model and implementation of a multi-agent system to support decisions to optimize a production process in companies.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found