All environments produce observations outside of observation space.
See original GitHub issueThe following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of TIMESTEPS_PER_ENV
, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.
import numpy as np
from metaworld.benchmarks import ML1
TIMESTEPS_PER_ENV = 1000
def main():
# Iterate over environment names.
for env_name in ML1.available_tasks():
# Create environment.
env = ML1.get_train_tasks(env_name)
tasks = env.sample_tasks(1)
env.set_task(tasks[0])
# Get boundaries of observation space and initial observation.
low = env.observation_space.low
high = env.observation_space.high
obs = env.reset()
# Create list of indices of observation space whose bounds are violated.
broken_indices = []
# Run environment.
for _ in range(TIMESTEPS_PER_ENV):
# Test if observation is outside observation space.
if np.any(np.logical_or(obs < low, obs > high)):
current_indices = np.argwhere(np.logical_or(obs < low, obs > high))
current_indices = current_indices.reshape((-1,)).tolist()
for current_index in current_indices:
if current_index not in broken_indices:
broken_indices.append(current_index)
# Sample action and perform environment step.
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
# Print out which indices of observation space were violated.
broken_indices = sorted(broken_indices)
print("%s broken indices: %r" % (env_name, broken_indices))
if __name__ == "__main__":
main()
Issue Analytics
- State:
- Created 4 years ago
- Comments:27 (19 by maintainers)
Top GitHub Comments
The observation spaces were written and tested for the case where the environments are fully observable. When they are partially observable, the last 3 elements of the observation get zeroed out, which is what’s happening here… and then the observation space is incorrect. I can push a fix tonight.
@haydenshively