[Bug] Why `predict` sometimes return `(array, states)` instead of `(action, state)`? Is it a BUG?
See original GitHub issueImportant Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
If your issue is related to a custom gym environment, please use the custom gym env template.
🐛 Bug
To Reproduce
import numpy as np
import stable_baselines3 as sb
import stable_baselines3.common.base_class
from stable_baselines3.common.env_checker import check_env
import gym, gym.spaces
class MarketEnv(gym.Env):
def __init__(self):
super().__init__()
self.simulate_max_steps = 100
self.action_space = gym.spaces.Discrete(3)
self.observation_space = gym.spaces.Box(
low=np.array([ 0, -np.inf]),
high=np.array([1, np.inf]),
dtype=np.float64
)
def reset(self):
self.simulate_max_steps = 100
return np.array([0, 0.0])
def step(self, action):
self.simulate_max_steps -= 1
obs = np.array([1, 0.0])
reward = 0.5
done = self.simulate_max_steps == 0
return obs, reward, done, { }
def close(self):
pass
def render(self):
pass
env = MarketEnv()
check_env(env)
model = sb.DQN('MlpPolicy', env, device='cpu')
model.learn(100)
def test_model(model: stable_baselines3.common.base_class.BaseAlgorithm,
env: MarketEnv, episode_nums: int) -> np.ndarray:
for _ in range(episode_nums):
obs = env.reset()
done = False
while not done:
action, states = model.predict(obs)
if isinstance(action, np.ndarray):
print(type(action), action)
obs, reward, done, info = env.step(action)
env.reset()
test_model(model, env, 1)
Output:
<class 'numpy.ndarray'> 1
<class 'numpy.ndarray'> 2
<class 'numpy.ndarray'> 1
<class 'numpy.ndarray'> 2
<class 'numpy.ndarray'> 2
<class 'numpy.ndarray'> 1
Expected behavior
model.predict()
should return (action, states)
, not (np.ndarray, states)
.
### System Info
OS: Windows-10-10.0.19041-SP0 10.0.19041
Python: 3.9.13
Stable-Baselines3: 1.6.0
PyTorch: 1.12.0
GPU Enabled: True
Numpy: 1.21.5
Gym: 0.21.0
Additional context
Add any other context about the problem here.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have read the documentation (required)
- I have provided a minimal working example to reproduce the bug (required)
Issue Analytics
- State:
- Created a year ago
- Comments:9 (1 by maintainers)
Top Results From Across the Web
[rllib] Problem when specific state size of the environment + ...
There is two problems: When the state has a specific size (210), we have an error, and the actor die; A warning message...
Read more >React useReducer bug while updating state array
The solution is only add the new foodObject object once, based on the current state. Note also for the default "case" just return...
Read more >How mutating state can lead to bugs in software (with examples)
In the getFirstElement function above, we're returning the last element of the array. However, this modifies (mutates) the original array as well, and...
Read more >Bug listing with status RESOLVED with resolution FIXED as at ...
Bug listing with status RESOLVED with resolution FIXED as at 2022/12/17 06:46:03.
Read more >Hooks FAQ - React
They let you use state and other React features without writing a class. This page answers some of the frequently asked questions about...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Indeed:
The current implementation allows
predict
to return annp.int64
when agent acting greedly and when the observation is not vectorized. Butnp.int64
is not annp.ndarray
. This kind of bug should be detected by the pytype check but for some reason it is not. So I suggest to add a unitest to avoid this kind of future bug.Great, now I can reproduce. I also managed to make your code even more syntetic.
I don’t really know what makes
numpy.int64
andnumpy.ndarray
of dtypeint64
different, but I’m working on it.