Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] GymWrapper does not work with nested observation gym.spaces.Dict

See original GitHub issue

Describe the bug

Hi All,

First of all: thanks for the great work here!

I think I have encountered a bug in the GymWrapper in torchrl.envs.libs.gym.GymWrapper. When I use a gym.Env with an observation space with nested gym.spaces.Dict, a KeyError will be thrown since the GymLikeEnv.read_obs() function does only add “next_” to the first level of Dict but not to nested sub Dicts:

observations = {"next_" + key: value for key, value in observations.items()}

Since _gym_to_torchrl_spec_transform() in torchrl.envs.libs.gym ends “next_” in a recursive call to all sub Dicts, the key is missing the necessary “next_”. Nested Dict observation spaces are often used (https://www.gymlibrary.dev/api/spaces/#dict), so I guess this is required to work properly.

To Reproduce

#!/usr/bin/env python
from torchrl.envs.libs.gym import GymWrapper
from gym import spaces, Env
import numpy as np


class CustomGym(Env):
    def __init__(self):
        self.action_space = spaces.Discrete(5)
        self.observation_space = spaces.Dict(
            {
                'sensor_1': spaces.Box(low=0, high=255, shape=(5, 5, 3), dtype=np.uint8),
                'sensor_2': spaces.Box(low=0, high=255, shape=(5, 5, 3), dtype=np.uint8),
                'sensor_3': spaces.Box(np.array([-2, -1, -5, 0]), np.array([2, 1, 30, 1]), dtype=np.float32),
                'sensor_4': spaces.Dict({'sensor_41': spaces.Box(low=0, high=100, shape=(1,), dtype=np.float32),
                                         'sensor_42': spaces.Box(low=0, high=100, shape=(1,), dtype=np.float32),
                                         'sensor_43': spaces.Box(low=0, high=100, shape=(1,), dtype=np.float32)})
            }
        )

    def reset(self):
        return self.observation_space.sample()


if __name__ == '__main__':
    env = CustomGym()
    env = GymWrapper(env)

Reason and Possible fixes

The issue can be fixed by adding a recursive function call to rename also nested observation space Dicts in GymLikeEnv.read_obs() correctly by adding “next_”:


    def read_obs(
        self, observations: Union[Dict[str, Any], torch.Tensor, np.ndarray]
    ) -> Dict[str, Any]:
        """Reads an observation from the environment and returns an observation compatible with the output TensorDict.

        Args:
            observations (observation under a format dictated by the inner env): observation to be read.

        """
        if isinstance(observations, dict):

            def rename(obs):
                return {
                    "next_" + key: rename(value) if isinstance(value, dict) else value
                    for key, value in obs.items()
                }

            observations = rename(observations)
        if not isinstance(observations, (TensorDict, dict)):
            key = list(self.observation_spec.keys())[0]
            observations = {key: observations}
        observations = self.observation_spec.encode(observations)
        return observations

The style checker required to not use lambda functions, otherwise the fix could also be as simple as

             rename = lambda obs: {
                "next_" + key: rename(value) if isinstance(value, dict) else value
                for key, value in obs.items()
             }

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

Issue Analytics

State:
Created a year ago
Comments:12 (8 by maintainers)

Top GitHub Comments

2reactions

raphajanercommented, Nov 3, 2022

Yes sure, I’ll take care of it 😃 Thanks for the feedback!

1reaction

vmoenscommented, Nov 4, 2022

Hey! I see your point. We’re thinking about redesigning this API. I will open a PR with that shortly, but I’d be glad to get your thoughts about it.

First I think the "next_obs" is messy and makes it hard to get the tensordict of the next step. Second it does not scale well with other problems (e.g. MCTS or planners in general where we explore many different possible actions for a single state). Finally it requires for the users to pay attention to name the obs in the specs with the "next" prefix which they might as well forget and find cumbersome.

Here’s what I would see: Before: env.step returns

TensorDict({
  “state”: stuff,
  “reward”: reward,
  “done”: done,
  "action": action,
  “next_state”: stuff,
  "other": foo,
}, [])

We would change that in:

TensorDict({
  “state”: stuff,
  “reward”: reward,
  “done”: done,
  "action": action,
  “next”: TensorDict({
      “state”: stuff,
    }, []),
  "other": foo,
}, [])

That way, step_mdp just needs to do tensordict = tensordict["step"].clone(recurse=False) (we clone it, otherwise the original tensordict will keep track of the whole trajectory!) If you likes the previous API you can just do tensordict.flatten_keys("_").

So in your case you’d have this

TensorDict({
  “state”: stuff,
  “reward”: reward,
  “done”: done,
  "action": action,
  "camera": cam,
  “next”: TensorDict({
      “state”: stuff,
      "camera": cam,
    }, []),
  "other": foo,
}, [])

Thoughts?

cc @shagunsodhani (by the way it’s funny that we were just talking about that feature a couple of hours ago and @raphajaner came with a very similar idea!)

Top Results From Across the Web

gym/dict.py at master · openai/gym - spaces

A toolkit for developing and comparing reinforcement learning algorithms. - gym/dict.py at master · openai/gym.

How to use the gym.spaces.Dict function in gym

To help you get started, we've selected a few gym examples, based on popular ways it is used in public projects. Secure your...

stable_baselines3.common.env_checker - Stable Baselines3

Dict): nested_dict = True if nested_dict: warnings.warn( "Nested observation spaces are not supported by Stable Baselines3 " "(Dict spaces inside Dict ...

How do I get openai.gym.spaces.Dict state updated?

You may have to use MultiInputPolicy instead of MlpPolicy as the first parameter to the PPO class when using a Dict observation space:...

Stable Baselines Documentation

Note: Non-array spaces such as Dict or Tuple are not currently supported by any algorithm, except HER for dict when working with gym.GoalEnv....