question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

rllib: Using gym.RewardWrapper around MultiAgentEnv cause observation mismatch with observation_space

See original GitHub issue

system config

  • python version: 3.6.2
  • Ray version: 0.8.4
  • Tensorflow version: 1.14.0
  • OS: CentOS 7
  • gym version: 0.17.1

problem

I was trying to use customize my multi-agent env with some reward shaping using gym.RewardWrapper, and it gives me error

ValueError: ('Observation outside expected value range', Tuple(Discrete(2), Discrete(2)), {'agent_0': (0, 0), 'agent_1': (0, 0)})

Here is a minimal example

import numpy as np
import ray
from gym.spaces import Discrete, Tuple, Dict
from ray.rllib.env import MultiAgentEnv
import gym
import numpy as np
from ray.tune.registry import register_env
import ray.rllib.agents.a3c as a3c
config = {
    'env_config': {
        'num_agents': 2,
    },
    'env': 'NPD',
    'num_workers': 2,
    'use_pytorch': False,
    'train_batch_size': 200,
    'rollout_fragment_length': 200,
    'lr': 0.0001
}
class Example(MultiAgentEnv):

    def __init__(self, num_agents=2, max_steps=100):

        super(Example, self).__init__()
        self.reward_range = (-np.inf, np.inf)
        self.metadata = {'render.modes': []}
        self.num_agents = num_agents
        self.players = [ 'agent_' + str(i) for i in range(num_agents)]
        self.action_space = Discrete(2)
        self.observation_space = Tuple(tuple(Discrete(2) for _ in range(num_agents)))
        self.current_step = None
        self.max_steps = max_steps

    def reset(self):
        self.current_step = 0
        observation = [tuple(0 for _ in range(self.num_agents)) for _ in range(self.num_agents)]
        return dict(zip(self.players, observation))

    def step(self, actions_dict):
        actions = np.array(list(actions_dict.values())).flatten()
        reward = np.array([np.random.random() for _ in range(self.num_agents)])
        reward = dict(zip(self.players, reward))
        observation = dict(zip(self.players, [tuple(actions) for i in range(self.num_agents)]))
        self.current_step += 1
        done = {'__all__': self.current_step == self.max_steps}

        info = dict(zip(self.players, [{}]*self.num_agents))
        return observation, reward, done, info
class Rwrapper(gym.RewardWrapper):
    def __init__(self, env):
        self.reward_range = env.reward_range
        super(gym.RewardWrapper, self).__init__(env)
    def reward(self, reward):
        return reward
register_env('NPD', lambda env_config: Rwrapper(Example(**env_config)))
ray.init(num_cpus=4)
trainer = a3c.A3CTrainer(config = config, env = 'NPD')
trainer.train()

Running this above code block give me the following error image

And if I substitute register_env line with the following line (get rid of reward wrapper)

register_env('NPD', lambda env_config: Example(**env_config))

The code executes successfully. I wonder if anyone can help me with the issue Thank you very much

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
sven1977commented, May 8, 2020

Hey @0luhancheng0 . I’m very sorry, but we have decided that we don’t want to add support for wrapping MultiAgentEnvs (which are not gym.Env children) with gym.Wrappers. In order to handle reward shaping for our MultiAgentEnvs, could you simply write your own custom wrapper or post processing function that takes the rewards and manipulates them? Again, sorry for the inconvenience, but we don’t want to add support for something that would become harder to maintain at some point (b/c the MultiAgentEnv API may evolve over time and thus away from the gym.Env API).

1reaction
sven1977commented, May 4, 2020

Here is the PR that fixes the problem (test with your script passed). I’m closing this issue, but feel free to re-open it should the problem persist on your end. Thanks again for letting us know about this!

https://github.com/ray-project/ray/pull/8314

Read more comments on GitHub >

github_iconTop Results From Across the Web

Environments — Ray 2.2.0
RLlib uses Gym as its environment interface for single-agent training. For more information on how to implement a custom Gym environment, see the...
Read more >
Gym Wrappers - Lessons
Here we define a wrapper that takes an environment with a gym.Discrete observation space and generates a new environment with a one-hot encoding...
Read more >
How do you use OpenAI Gym 'wrappers' with a custom Gym ...
I was able to answer my own question about how to get Ray's tune.run() to work with a wrapped custom class for a...
Read more >
Using PettingZoo with RLlib for Multi-Agent Deep ...
This tutorial provides an overview for using the RLlib Python ... The observation space of each agent is a window above and to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found