Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

rllib: Using gym.RewardWrapper around MultiAgentEnv cause observation mismatch with observation_space

See original GitHub issue

system config

python version: 3.6.2
Ray version: 0.8.4
Tensorflow version: 1.14.0
OS: CentOS 7
gym version: 0.17.1

problem

I was trying to use customize my multi-agent env with some reward shaping using gym.RewardWrapper, and it gives me error

ValueError: ('Observation outside expected value range', Tuple(Discrete(2), Discrete(2)), {'agent_0': (0, 0), 'agent_1': (0, 0)})

Here is a minimal example

import numpy as np
import ray
from gym.spaces import Discrete, Tuple, Dict
from ray.rllib.env import MultiAgentEnv
import gym
import numpy as np
from ray.tune.registry import register_env
import ray.rllib.agents.a3c as a3c
config = {
    'env_config': {
        'num_agents': 2,
    },
    'env': 'NPD',
    'num_workers': 2,
    'use_pytorch': False,
    'train_batch_size': 200,
    'rollout_fragment_length': 200,
    'lr': 0.0001
}
class Example(MultiAgentEnv):

    def __init__(self, num_agents=2, max_steps=100):

        super(Example, self).__init__()
        self.reward_range = (-np.inf, np.inf)
        self.metadata = {'render.modes': []}
        self.num_agents = num_agents
        self.players = [ 'agent_' + str(i) for i in range(num_agents)]
        self.action_space = Discrete(2)
        self.observation_space = Tuple(tuple(Discrete(2) for _ in range(num_agents)))
        self.current_step = None
        self.max_steps = max_steps

    def reset(self):
        self.current_step = 0
        observation = [tuple(0 for _ in range(self.num_agents)) for _ in range(self.num_agents)]
        return dict(zip(self.players, observation))

    def step(self, actions_dict):
        actions = np.array(list(actions_dict.values())).flatten()
        reward = np.array([np.random.random() for _ in range(self.num_agents)])
        reward = dict(zip(self.players, reward))
        observation = dict(zip(self.players, [tuple(actions) for i in range(self.num_agents)]))
        self.current_step += 1
        done = {'__all__': self.current_step == self.max_steps}

        info = dict(zip(self.players, [{}]*self.num_agents))
        return observation, reward, done, info
class Rwrapper(gym.RewardWrapper):
    def __init__(self, env):
        self.reward_range = env.reward_range
        super(gym.RewardWrapper, self).__init__(env)
    def reward(self, reward):
        return reward
register_env('NPD', lambda env_config: Rwrapper(Example(**env_config)))
ray.init(num_cpus=4)
trainer = a3c.A3CTrainer(config = config, env = 'NPD')
trainer.train()

Running this above code block give me the following error

And if I substitute register_env line with the following line (get rid of reward wrapper)

register_env('NPD', lambda env_config: Example(**env_config))

The code executes successfully. I wonder if anyone can help me with the issue Thank you very much

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

sven1977commented, May 8, 2020

Hey @0luhancheng0 . I’m very sorry, but we have decided that we don’t want to add support for wrapping MultiAgentEnvs (which are not gym.Env children) with gym.Wrappers. In order to handle reward shaping for our MultiAgentEnvs, could you simply write your own custom wrapper or post processing function that takes the rewards and manipulates them? Again, sorry for the inconvenience, but we don’t want to add support for something that would become harder to maintain at some point (b/c the MultiAgentEnv API may evolve over time and thus away from the gym.Env API).

1reaction

sven1977commented, May 4, 2020

Here is the PR that fixes the problem (test with your script passed). I’m closing this issue, but feel free to re-open it should the problem persist on your end. Thanks again for letting us know about this!

https://github.com/ray-project/ray/pull/8314