Multi-Agent Env Support?
See original GitHub issueHello, does the lib support multi-agent environment? Or more precisely, allow multiple agents share environment state, select their action in parallel, then return the combined actions to the environment? -----------------Edit----------------- After multiple tries, I figure out some tips for training with Multi-Agent environment.
How to pass multi-agent observation
If there is only one model for all agent, simply pack all observation into one array, and pretend it as a single mega agent environment.
If there are multiple models, follow same procedure, and also devise algo
and agent
part of your code. It’s recommended to use torch.nn.ModuleList
or torch.nn.ModuleDict
to organize multiple models, then apply function in parallel to each model.
How to pass multiple reward values
A typical Gym
environment step return should be a four elements tuple: observation, reward, done, info
. The reward
in the return of step
must be a scalar because evaluation need it to calculate total episode reward.
However, sometimes you may want to have a unique reward for each agent, which must be an 1d array. The key point of solution is to passing your actual reward from another output other than reward
. To resolve this problem, modify the enviroment with:
actual_reward = self.reward() # Get reward at this step, actual_reward is an array or nested array
reward = sum(actual_reward) # reward is scalar.
# Or take a mean of it if you need, but remember, the Return in evaluation is the cumulative reward of whole episode/trajectory.
info = OrderedDict(
reward=actual_reward, # This reward is an array or nested array
)
return observation, reward, done, info
Then, in algo
part of your code, modify initialize_replay_buffer
, samples_to_buffer
and any functions that relevant to the conversion between samples
and buffer
:
SamplesToBuffer = namedarraytuple("SamplesToBuffer", ["observation", "action", "reward", "done"])
buffer = SamplesToBuffer(
observation=observation,
action=samples.agent.action,
reward=samples.env.reward, # Change this line to reward=samples.env.env_info.reward,
done=samples.env.done,
)
example_to_buffer = SamplesToBuffer(
observation=examples["observation"],
action=examples["action"],
reward=examples["reward"], # Change this line to reward=examples['env_info'].reward,
done=examples["done"],
)
After that, your algo would receive actual_reward
(which is an array or nested array) instead of scalar reward
.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:30 (30 by maintainers)
: ) Hello astooke, the training is finished and is a great💯 success.
Many thanks to you and this awesome libs! 🎉 Since the original problem is resolved, I will close this issue, and make a summary about how to handle the output of Multi-Agent environment in the first post. Once the paper is finished, I would like to contribute a citation to the whitepaper of this great lib. Have a good day, and Happy Reinforcement Learning!
Hi! The library does not currently support multi-agent environment interactions directly. Although I hope it is the sort of thing this could be extended to. One way to do it would be to write the multiple agents into one agent, and then use a
Composite
action space to collect all of their actions and pass them into the environment. The algorithm would then have access to the multiple agents, as well. Could be a fairly quick thing, or there might be some hidden difficulties…let us know if you try and what happens! Happy to answer more questions or help along the way.