rllib: Using gym.RewardWrapper around MultiAgentEnv cause observation mismatch with observation_space
See original GitHub issuesystem config
- python version: 3.6.2
- Ray version: 0.8.4
- Tensorflow version: 1.14.0
- OS: CentOS 7
- gym version: 0.17.1
problem
I was trying to use customize my multi-agent env with some reward shaping using gym.RewardWrapper, and it gives me error
ValueError: ('Observation outside expected value range', Tuple(Discrete(2), Discrete(2)), {'agent_0': (0, 0), 'agent_1': (0, 0)})
Here is a minimal example
import numpy as np
import ray
from gym.spaces import Discrete, Tuple, Dict
from ray.rllib.env import MultiAgentEnv
import gym
import numpy as np
from ray.tune.registry import register_env
import ray.rllib.agents.a3c as a3c
config = {
'env_config': {
'num_agents': 2,
},
'env': 'NPD',
'num_workers': 2,
'use_pytorch': False,
'train_batch_size': 200,
'rollout_fragment_length': 200,
'lr': 0.0001
}
class Example(MultiAgentEnv):
def __init__(self, num_agents=2, max_steps=100):
super(Example, self).__init__()
self.reward_range = (-np.inf, np.inf)
self.metadata = {'render.modes': []}
self.num_agents = num_agents
self.players = [ 'agent_' + str(i) for i in range(num_agents)]
self.action_space = Discrete(2)
self.observation_space = Tuple(tuple(Discrete(2) for _ in range(num_agents)))
self.current_step = None
self.max_steps = max_steps
def reset(self):
self.current_step = 0
observation = [tuple(0 for _ in range(self.num_agents)) for _ in range(self.num_agents)]
return dict(zip(self.players, observation))
def step(self, actions_dict):
actions = np.array(list(actions_dict.values())).flatten()
reward = np.array([np.random.random() for _ in range(self.num_agents)])
reward = dict(zip(self.players, reward))
observation = dict(zip(self.players, [tuple(actions) for i in range(self.num_agents)]))
self.current_step += 1
done = {'__all__': self.current_step == self.max_steps}
info = dict(zip(self.players, [{}]*self.num_agents))
return observation, reward, done, info
class Rwrapper(gym.RewardWrapper):
def __init__(self, env):
self.reward_range = env.reward_range
super(gym.RewardWrapper, self).__init__(env)
def reward(self, reward):
return reward
register_env('NPD', lambda env_config: Rwrapper(Example(**env_config)))
ray.init(num_cpus=4)
trainer = a3c.A3CTrainer(config = config, env = 'NPD')
trainer.train()
Running this above code block give me the following error
And if I substitute register_env line with the following line (get rid of reward wrapper)
register_env('NPD', lambda env_config: Example(**env_config))
The code executes successfully. I wonder if anyone can help me with the issue Thank you very much
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Environments — Ray 2.2.0
RLlib uses Gym as its environment interface for single-agent training. For more information on how to implement a custom Gym environment, see the...
Read more >Gym Wrappers - Lessons
Here we define a wrapper that takes an environment with a gym.Discrete observation space and generates a new environment with a one-hot encoding...
Read more >How do you use OpenAI Gym 'wrappers' with a custom Gym ...
I was able to answer my own question about how to get Ray's tune.run() to work with a wrapped custom class for a...
Read more >Using PettingZoo with RLlib for Multi-Agent Deep ...
This tutorial provides an overview for using the RLlib Python ... The observation space of each agent is a window above and to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @0luhancheng0 . I’m very sorry, but we have decided that we don’t want to add support for wrapping MultiAgentEnvs (which are not gym.Env children) with gym.Wrappers. In order to handle reward shaping for our MultiAgentEnvs, could you simply write your own custom wrapper or post processing function that takes the rewards and manipulates them? Again, sorry for the inconvenience, but we don’t want to add support for something that would become harder to maintain at some point (b/c the MultiAgentEnv API may evolve over time and thus away from the gym.Env API).
Here is the PR that fixes the problem (test with your script passed). I’m closing this issue, but feel free to re-open it should the problem persist on your end. Thanks again for letting us know about this!
https://github.com/ray-project/ray/pull/8314