Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Why `predict` sometimes return `(array, states)` instead of `(action, state)`? Is it a BUG?

See original GitHub issue

Important Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

If your issue is related to a custom gym environment, please use the custom gym env template.

🐛 Bug

To Reproduce

import numpy as np
import stable_baselines3 as sb
import stable_baselines3.common.base_class
from stable_baselines3.common.env_checker import check_env
import gym, gym.spaces


class MarketEnv(gym.Env):
    def __init__(self):
        super().__init__()

        self.simulate_max_steps = 100

        self.action_space = gym.spaces.Discrete(3)
        self.observation_space = gym.spaces.Box(
            low=np.array([ 0, -np.inf]),
            high=np.array([1,  np.inf]),
            dtype=np.float64
        )

    def reset(self):
        self.simulate_max_steps = 100

        return np.array([0, 0.0])

    def step(self, action):
        self.simulate_max_steps -= 1

        obs = np.array([1, 0.0])
        reward = 0.5
        done = self.simulate_max_steps == 0
        return obs, reward, done, { }

    def close(self):
        pass

    def render(self):
        pass

env = MarketEnv()
check_env(env)

model = sb.DQN('MlpPolicy', env, device='cpu')
model.learn(100)

def test_model(model: stable_baselines3.common.base_class.BaseAlgorithm, 
               env: MarketEnv, episode_nums: int) -> np.ndarray:
    for _ in range(episode_nums):
        obs = env.reset()
        done = False

        while not done:
            action, states = model.predict(obs)
            if isinstance(action, np.ndarray):
                print(type(action), action)
            obs, reward, done, info = env.step(action)

    env.reset()


test_model(model, env, 1)

Output:

<class 'numpy.ndarray'> 1
<class 'numpy.ndarray'> 2
<class 'numpy.ndarray'> 1
<class 'numpy.ndarray'> 2
<class 'numpy.ndarray'> 2
<class 'numpy.ndarray'> 1

Expected behavior

model.predict() should return (action, states), not (np.ndarray, states).

### System Info

OS: Windows-10-10.0.19041-SP0 10.0.19041
Python: 3.9.13
Stable-Baselines3: 1.6.0
PyTorch: 1.12.0
GPU Enabled: True
Numpy: 1.21.5
Gym: 0.21.0

Additional context

Add any other context about the problem here.

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

Issue Analytics

State:
Created a year ago
Comments:9 (1 by maintainers)

Top GitHub Comments

1reaction

qgallouedeccommented, Jul 15, 2022

the difference in types comes from the epsilon greedy policy.

Indeed:

The current implementation allows predict to return an np.int64 when agent acting greedly and when the observation is not vectorized. But np.int64 is not an np.ndarray. This kind of bug should be detected by the pytype check but for some reason it is not. So I suggest to add a unitest to avoid this kind of future bug.

1reaction

qgallouedeccommented, Jul 15, 2022

I had update my origin issue, add minimal to reproduce.

Great, now I can reproduce. I also managed to make your code even more syntetic.

import gym
from stable_baselines3 import DQN

env = gym.make("CartPole-v1")

model = DQN("MlpPolicy", env)
model.learn(100)

obs = env.reset()
done = False

while not done:
    action, states = model.predict(obs)
    print(type(action), action.dtype)
    obs, reward, done, info = env.step(action)

<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.ndarray'> int64
<class 'numpy.ndarray'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64
<class 'numpy.int64'> int64

I don’t really know what makes numpy.int64 and numpy.ndarray of dtype int64 different, but I’m working on it.

Top Results From Across the Web

[rllib] Problem when specific state size of the environment + ...

There is two problems: When the state has a specific size (210), we have an error, and the actor die; A warning message...

React useReducer bug while updating state array

The solution is only add the new foodObject object once, based on the current state. Note also for the default "case" just return...

How mutating state can lead to bugs in software (with examples)

In the getFirstElement function above, we're returning the last element of the array. However, this modifies (mutates) the original array as well, and...

Bug listing with status RESOLVED with resolution FIXED as at ...

Bug listing with status RESOLVED with resolution FIXED as at 2022/12/17 06:46:03.

Hooks FAQ - React

They let you use state and other React features without writing a class. This page answers some of the frequently asked questions about...