[question] Why am I getting an unexpected activation, even before training ?
See original GitHub issuenote: I understand this is probably not the best place to ask this question. However I couldn’t find an official or recommended forum for this…
I am working on creating a custom environment and training a RL agent on it.
My environment has an action space of size 127, and interprets it as a one-hot vector: Taking the index of the highest value in the vector as an input value. For debugging, I create a bar chart, showing how many times each value has been “called”
Before training, I would expect the graph to show a roughly uniform distribution of “events”, but instead the “events” in the lower end of the action spec are massively more likely than the others
I have created a colab to explain and reproduce the “issue” here
Here is a short version, in case the colab doesn’t work
note_dims = 127
note_counters = np.zeros((note_dims))
class CustomEnv(gym.Env):
def __init__(self):
self.action_space = spaces.Box(-1, 1, [note_dims], dtype=np.float32);
self.observation_space = spaces.Box(-1, 1, [10], dtype=float);
self._state = np.zeros([10])
def reset(self):
self._step_count = 0
self._state = np.zeros([10])
return self._state;
def render(self, *args):
pass
def step(self, action):
self._step_count += 1
# map actions from -1,1 to 0,1
action = action * 0.5 + 0.5
pitch = np.argmax(action)
note_counters[pitch] += 1
reward = 0
isdone = self._step_count > 500
observation = self._state
return observation, reward, isdone, {}
# Make sure the environment is properly configured
env = CustomEnv()
check_env(env, warn=True)
# vectorize the environment
env = make_vec_env(lambda: env, n_envs=1)
# I have tried multiple model architectures here but the results are always the same
model = PPO2('MlpPolicy', env, verbose=1)
obs = env.reset()
note_counters = np.zeros((note_dims))
for step_idx in range(1000):
action, _states = model.predict(obs)
env.step(action)
plt.bar(np.arange(0, note_dims), note_counters)
system info default colab instance
Issue Analytics
- State:
- Created 3 years ago
- Comments:6
Top GitHub Comments
Why do you use a
Box
space and not aDiscrete
space?EDIT: we mention that issue in the documentation: if you use a
Box
then the distribution used is a Gaussian, so you won’t get uniform samplingAll of this is a consequence of the distributions used.
If you take the the action vector and reverse it. You will see the same poison like distribution when using an untrained model.
Anyway this is unrelated to sb.