question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why does SB3's DQN fails on a custom environment but SB2's DQN does not?

See original GitHub issue

There are several issues related to the performance of SB2 and SB3, such as this one. Here, I am specifically focusing on DQN’s behavior. I am using a custom environment (simple 4x4 grid world where the goal is to get from one cell to another). I am using the equivalent code in SB2 and SB3 to train and evaluate the RL model/algorithm.

Specifically, this is the code I am using with SB2

from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.bench.monitor import Monitor
from stable_baselines.results_plotter import X_TIMESTEPS, plot_results
from stable_baselines.deepq.dqn import DQN
from stable_baselines.deepq.policies import MlpPolicy

...

model = DQN(MlpPolicy, env, verbose=1, exploration_fraction=0.1)
model.learn(total_timesteps=20000)

And the (supposed) SB3 equivalent code is only different in the imports, which are

from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.results_plotter import X_TIMESTEPS, plot_results
from stable_baselines3.dqn.dqn import DQN
from stable_baselines3.dqn.policies import MlpPolicy

With SB2, after training, my model regularly achieves the best performance (reaches the goal location and gets the highest amount of reward). On the other hand, with SB3, the model is never able to reach the goal during evaluation (with the same number of time steps or even if I increase the number of time steps). Not sure why. Clearly, there are big differences between SB2 and SB3, apart from the fact that SB2 uses TF 1 and SB3 PyTorch. However, it’s also true that, during training, the SB3 implementation eventually gets to the goal location (according to the reward received, which I am plotting with plot_results after having kept track of them with Monitor). However, as I just said, during evaluation, sometimes it just gets stuck by using the same apparently invalid action.

Here’s the code used during evaluation to take actions (for both SB2 and SB3).

action, hidden_states = model.predict(next_observation, deterministic=True)
next_observation, reward, done, info = env.step(action)

(Also, weirdly enough, sometimes done = True, but the final reward is zero (although it should be 1 in that case), but this is another issue.)


Update 1

Actually, now, the SB2 version also fails. This happened after having installed SB3 too. However, I already created a new environment without SB3, so SB3 is not the issue. I know that these algorithms are stochastic, but it seems strange that, from one run to the other, the results can be completely different, in such a simple environment, after so many time steps, i.e. 20000.

Now, I increased the number of time steps to 30000, and the SB2 seems to work again (but maybe it fails again in a moment, lol). Btw, the SB3 version still fails.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
araffincommented, Nov 16, 2020

Hello,

Short answer: as mentioned several times, we do not do tech support, please read the RL Tips and Tricks and the migration guide carefully next time, please also use the issue template 😉

Long answer:

There are several issues related to the performance of SB2 and SB3, such as this one.

performances were checked

I am using the equivalent code in SB2 and SB3 to train and evaluate the RL model/algorithm.

Please read the migration guide

SB2 and SB3 DQN are quite different if you use the default hyperparameters.

Now, I increased the number of time steps to 30000

30k steps does not seem much and you will need probably to do some hyperparameter tuning.

Last but not least, take your time. You have opened many issues in the last days. As mentioned in the doc, you should probably do some hyperparameter tuning and don’t expect everything to work out of the box without any tuning.

So, if you think there is an issue with SB3, please fill up the issue template completely (so we can reproduce the potential problem) but take your time, we do not do tech support.

0reactions
araffincommented, Nov 16, 2020

, so I am not sure why at least DDQN was not implemented.

https://github.com/DLR-RM/stable-baselines3/issues/1

Read more comments on GitHub >

github_iconTop Results From Across the Web

master PDF - Stable Baselines3 Documentation
custom environment or implementing an RL algorithm. ... DQN is usually slower to train (regarding wall clock time) but is the most sample ......
Read more >
DQN Agent Issue With Custom Environment - Stack Overflow
I tried implementing my own DQN following the 1_dqn_tutorial from official tensorflow docs. My observation_specs are of the shape (20,20) ...
Read more >
Reinforcement Learning in Python with Stable Baselines 3
The objective of the SB3 library is to be for reinforcement learning like what ... (cartpole, lunar lander, some other custom environment). ......
Read more >
reinforcement learning - DQN on multiagent environment
I have read couple of GitHub repos but all of them are failing. Can anybody guide me how we can use DQN algorithm...
Read more >
Deep Q-Learning (DQN) - CleanRL
Note that the DQN has no official benchmark on classic control environments, so we did not include a comparison. That said, our dqn.py...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found