Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi-Agent Env Support?

See original GitHub issue

Hello, does the lib support multi-agent environment? Or more precisely, allow multiple agents share environment state, select their action in parallel, then return the combined actions to the environment? -----------------Edit----------------- After multiple tries, I figure out some tips for training with Multi-Agent environment.

How to pass multi-agent observation

If there is only one model for all agent, simply pack all observation into one array, and pretend it as a single mega agent environment. If there are multiple models, follow same procedure, and also devise algo and agent part of your code. It’s recommended to use torch.nn.ModuleList or torch.nn.ModuleDict to organize multiple models, then apply function in parallel to each model.

How to pass multiple reward values

A typical Gym environment step return should be a four elements tuple: observation, reward, done, info. The reward in the return of step must be a scalar because evaluation need it to calculate total episode reward. However, sometimes you may want to have a unique reward for each agent, which must be an 1d array. The key point of solution is to passing your actual reward from another output other than reward. To resolve this problem, modify the enviroment with:

      actual_reward = self.reward()  # Get reward at this step, actual_reward is an array or nested array
      reward = sum(actual_reward)  # reward is scalar.
      # Or take a mean of it if you need, but remember, the Return in evaluation is the cumulative reward of whole episode/trajectory.
      info = OrderedDict(
          reward=actual_reward,  # This reward is an array or nested array
      )
      return observation, reward, done, info

Then, in algo part of your code, modify initialize_replay_buffer, samples_to_buffer and any functions that relevant to the conversion between samples and buffer:

SamplesToBuffer = namedarraytuple("SamplesToBuffer", ["observation", "action", "reward", "done"])

        buffer = SamplesToBuffer(
            observation=observation,
            action=samples.agent.action,
            reward=samples.env.reward,  # Change this line to reward=samples.env.env_info.reward,
            done=samples.env.done,
        )

        example_to_buffer = SamplesToBuffer(
            observation=examples["observation"],
            action=examples["action"],
            reward=examples["reward"],  # Change this line to reward=examples['env_info'].reward,
            done=examples["done"],
        )

After that, your algo would receive actual_reward (which is an array or nested array) instead of scalar reward.

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:30 (30 by maintainers)

Top GitHub Comments

2reactions

wangwwno1commented, Oct 26, 2019

: ) Hello astooke, the training is finished and is a great💯 success.
Many thanks to you and this awesome libs! 🎉 Since the original problem is resolved, I will close this issue, and make a summary about how to handle the output of Multi-Agent environment in the first post. Once the paper is finished, I would like to contribute a citation to the whitepaper of this great lib. Have a good day, and Happy Reinforcement Learning!

2reactions

astookecommented, Sep 12, 2019

Hi! The library does not currently support multi-agent environment interactions directly. Although I hope it is the sort of thing this could be extended to. One way to do it would be to write the multiple agents into one agent, and then use a Composite action space to collect all of their actions and pass them into the environment. The algorithm would then have access to the multiple agents, as well. Could be a fairly quick thing, or there might be some hidden difficulties…let us know if you try and what happens! Happy to answer more questions or help along the way.