Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Why am I getting an unexpected activation, even before training ?

See original GitHub issue

note: I understand this is probably not the best place to ask this question. However I couldn’t find an official or recommended forum for this…

I am working on creating a custom environment and training a RL agent on it.

My environment has an action space of size 127, and interprets it as a one-hot vector: Taking the index of the highest value in the vector as an input value. For debugging, I create a bar chart, showing how many times each value has been “called”

Before training, I would expect the graph to show a roughly uniform distribution of “events”, but instead the “events” in the lower end of the action spec are massively more likely than the others

I have created a colab to explain and reproduce the “issue” here

Here is a short version, in case the colab doesn’t work

note_dims = 127
note_counters = np.zeros((note_dims))

class CustomEnv(gym.Env):
  def __init__(self):
    self.action_space = spaces.Box(-1, 1, [note_dims], dtype=np.float32);
    self.observation_space = spaces.Box(-1, 1, [10], dtype=float);
    self._state = np.zeros([10])

  def reset(self):
    self._step_count = 0
    self._state = np.zeros([10])
    return self._state;

  def render(self, *args):
    pass

  def step(self, action):
    self._step_count += 1

    # map actions from -1,1 to 0,1
    action = action * 0.5 + 0.5

    pitch = np.argmax(action)
    note_counters[pitch] += 1

    reward = 0
    isdone = self._step_count > 500
    observation = self._state

    return observation, reward, isdone, {}

# Make sure the environment is properly configured
env = CustomEnv()
check_env(env, warn=True)
# vectorize the environment
env = make_vec_env(lambda: env, n_envs=1)

# I have tried multiple model architectures here but the results are always the same
model = PPO2('MlpPolicy', env, verbose=1)

obs = env.reset()
note_counters = np.zeros((note_dims))

for step_idx in range(1000):
  action, _states = model.predict(obs)
  env.step(action)

plt.bar(np.arange(0, note_dims), note_counters)

system info default colab instance