AssertionError: The observation space must inherit from gym.spaces cf https://github.com/openai/gym/blob/master/gym/spaces/
See original GitHub issueDescribe the bug
Hi all, I am using the stable-baselines for a policy optimisation program pertaining to a drug distribution problem. I have made a custom environment following the gym interface using the guide given at [https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb#scrollTo=1CcUVatq-P0l] and tried to validate it using the check_env()
method.
I am unable to understand and fix the error described below.
Code example This is the code I made:
%tensorflow_version 1.x
!pip install stable-baselines[mpi]==2.10.0
import numpy as np
import gym
from gym import spaces
import matplotlib.pyplot as plt
from stable_baselines.common.env_checker import check_env
class StatesEnv(gym.Env):
"""
Customised Environment that follows gym interface.
Describes relevant properties of the state and action spaces.
"""
metadata = {'render.modes':['human']}
# states = 6 # Delhi, Guj, Raja, MP, Maha, TN
# properties = 5
def __init__(self, s, prop, episodes):
#initialise state properties and their values, obsn space and action space???
self.states = s #no of independent simulations to be run
self.properties = prop
#observation will be the condition of state at a particular time
self.observation_space = np.array(spaces.Box(low= np.zeros((s, prop)), high = np.full((s, prop), float('inf')), shape = (s, prop), dtype = np.float32))
#actions are vectors of the form [n1, n2, n3,...nk, r] for k states and r reserved amount of drug
self.action_space = np.array(spaces.Box(low = np.zeros((s+1, ), dtype = int), high = np.array([100]*(s+1)), shape = (s + 1, ), dtype = np.uint8))
# sum = 0
# for i in range(s+1):
# sum += self.action_space[i]
# assert sum == 100 #returns error if total % is not 100
self.m = []
self.prob_dying = []
self.episodes = episodes
def reset(self):
"""
Resets observation_space to a matrix initialising situation of states wrt the current figures;
action_space tp start exploring from the point of equal distribution between all states.
"""
self.action_space = np.array([100/(self.states+1)]*(self.states+1))
self.observation_space = np.array([[80188, 28329, 2558, 16787941, 0.03190003492],
[30709, 6511, 1789, 60439692, 0.05825653717],
[16944, 3186, 391, 68548437, 0.02307601511],
[12965, 2444, 550, 72626809, 0.04242190513],
[159133, 67615, 7273, 112374333, 0.04570390805],
[78335, 33216, 1025, 72147030, 0.01308482798]])
# Confirmed Active Deaths Population P(dying)
# Delhi, Guj, Raja, MP, Maha, TN
def step(self, action, total):
"""
Assumptions:
1. Drug has 50% efficacy
2. Vaccine is passive, not antigen based- works to fight off existing infection.
3. 1 person requires 1 vial (dose) only.
"""
P = self.observation_space
# Total number of vials available
self.total = total
#no of units distrbuted to respective states
received = []
for i in range(self.states):
received[i] = self.total*action[i]/100
#add column of units distributed per state to update observation space
P = np.append(P, received, axis=1)
#measuring the effect of drug on each state
# m is the no of ppl moving from active to recovered
for i in range(self.states):
self.m[i] = 0.5*received[i] #50% efficacy
P[i, 1] -= self.m[i]
self.prob_dying[i] = P[i, 2]/P[i, 0]
#task is done when all states show a decrease in probability of dying
done = bool(P[i] < self.observation_space[i, 4] for i in range(self.states))
#reward only when task done
reward = 10 if done else 0
# Optionally we can pass additional info
info = {self.prob_dying}
return P, reward, done, info
def render(self, mode='human'):
x = self.episodes
y = []
for i in range(self.states):
y[i] = self.prob_dying[i]
y.append(y[i])
plt.plot(x, y[i])
plt.xlabel('Number of episodes')
plt.ylabel('P(dying) of state')
plt.title('Learning Process')
plt.show()
def close(self):
pass
env = StatesEnv(6, 5, 25000)
# If the environment don't follow the interface, an error will be thrown
check_env(env, warn=True)
The error trace is as follows:
/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-25-b204228c79f7> in <module>()
1 env = StatesEnv(6, 5, 25000)
2 # If the environment don't follow the interface, an error will be thrown
----> 3 check_env(env, warn=True)
1 frames
/usr/local/lib/python3.6/dist-packages/stable_baselines/common/env_checker.py in check_env(env, warn, skip_render_check)
183
184 # ============= Check the spaces (observation and action) ================
--> 185 _check_spaces(env)
186
187 # Define aliases for convenience
/usr/local/lib/python3.6/dist-packages/stable_baselines/common/env_checker.py in _check_spaces(env)
132
133 assert isinstance(env.observation_space,
--> 134 spaces.Space), "The observation space must inherit from gym.spaces" + gym_spaces
135 assert isinstance(env.action_space, spaces.Space), "The action space must inherit from gym.spaces" + gym_spaces
136
AssertionError: The observation space must inherit from gym.spaces cf https://github.com/openai/gym/blob/master/gym/spaces/
System Info Describe the characteristic of your environment:
- Installation:
%tensorflow_version 1.x !pip install stable-baselines[mpi]==2.10.0
- GPU models and configuration: none used
- Python version:
import sys
sys.version
gives:
3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0]
- Tensorflow version:
TensorFlow 1.x selected.
- Versions of any other relevant libraries: none
Additional context
I am not entirely sure here, but the problem may stem from the fact that I wanted my observation_space
and action_space
to be arrays and so converted the Box type by passing them into numpy.array()
method. I am not sure if that’s the right way to do it, and it’ll be great if someone could clarify this as well!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top GitHub Comments
That is changing the spaces, you should not assign anything to observation/action_space after defined initially.
reset
should return the initial values.These are questions that are outside stable-baselines and well documented in docs and in OpenAI Gym. I am closing this issue.
@Miffyli the purpose of defining the starting values of the spaces in
reset
is merely to initialise the value of the spaces from where the exploration-exploitation should start operating and the model begins to train. umm I am not sure if that counts as changing the spaces…?