CDQN nan actions
See original GitHub issueI just ported the CDQN pendulum agent to an environment of mine. When I run the model, the first few steps contain valid values but the rest are nan. I am not sure what is up here. Let me know what I can provide to help debug.
> python .\dqn.py -d C:\Users\Ryan\Dropbox\cmu-sf\deepsf-data2 --visualize
Using Theano backend.
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
flatten_1 (Flatten) (None, 60) 0 flatten_input_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 16) 976 flatten_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation) (None, 16) 0 dense_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 16) 272 activation_1[0][0]
____________________________________________________________________________________________________
activation_2 (Activation) (None, 16) 0 dense_2[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 16) 272 activation_2[0][0]
____________________________________________________________________________________________________
activation_3 (Activation) (None, 16) 0 dense_3[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 1) 17 activation_3[0][0]
____________________________________________________________________________________________________
activation_4 (Activation) (None, 1) 0 dense_4[0][0]
====================================================================================================
Total params: 1537
____________________________________________________________________________________________________
None
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
flatten_2 (Flatten) (None, 60) 0 flatten_input_2[0][0]
____________________________________________________________________________________________________
dense_5 (Dense) (None, 16) 976 flatten_2[0][0]
____________________________________________________________________________________________________
activation_5 (Activation) (None, 16) 0 dense_5[0][0]
____________________________________________________________________________________________________
dense_6 (Dense) (None, 16) 272 activation_5[0][0]
____________________________________________________________________________________________________
activation_6 (Activation) (None, 16) 0 dense_6[0][0]
____________________________________________________________________________________________________
dense_7 (Dense) (None, 16) 272 activation_6[0][0]
____________________________________________________________________________________________________
activation_7 (Activation) (None, 16) 0 dense_7[0][0]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 2L) 34 activation_7[0][0]
____________________________________________________________________________________________________
activation_8 (Activation) (None, 2L) 0 dense_8[0][0]
====================================================================================================
Total params: 1554
____________________________________________________________________________________________________
None
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
observation_input (InputLayer) (None, 1, 60L) 0
____________________________________________________________________________________________________
action_input (InputLayer) (None, 2L) 0
____________________________________________________________________________________________________
flatten_3 (Flatten) (None, 60) 0 observation_input[0][0]
____________________________________________________________________________________________________
merge_1 (Merge) (None, 62) 0 action_input[0][0]
flatten_3[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 32) 2016 merge_1[0][0]
____________________________________________________________________________________________________
activation_9 (Activation) (None, 32) 0 dense_9[0][0]
____________________________________________________________________________________________________
dense_10 (Dense) (None, 32) 1056 activation_9[0][0]
____________________________________________________________________________________________________
activation_10 (Activation) (None, 32) 0 dense_10[0][0]
____________________________________________________________________________________________________
dense_11 (Dense) (None, 32) 1056 activation_10[0][0]
____________________________________________________________________________________________________
activation_11 (Activation) (None, 32) 0 dense_11[0][0]
____________________________________________________________________________________________________
dense_12 (Dense) (None, 3L) 99 activation_11[0][0]
____________________________________________________________________________________________________
activation_12 (Activation) (None, 3L) 0 dense_12[0][0]
====================================================================================================
Total params: 4227
____________________________________________________________________________________________________
None
Training for 21820000 steps ...
[-29.43209839 41.64512253]
[-26.13952446 42.74395752]
[-29.95537758 54.30570602]
[-28.84783554 35.84109497]
[-26.03454971 31.98110199]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
[ nan nan]
Issue Analytics
- State:
- Created 7 years ago
- Comments:22 (11 by maintainers)
Top Results From Across the Web
Can tf.agent policy return probability vector for all actions?
In my application, I have 1 action containing 9 possible discrete values (labeled from 0 to 8). Below is the output from env.action_spec()...
Read more >[Help] Getting RL agent to learn in continuous action space in ...
I am using Keras-RL's CDQN and NAF agents because they can handle ... if the learning rate is too large, explodes into NaN...
Read more >Smart Driving Agent based on Deep Reinforcement Learning
The agent bases its actions on the observations ... Continuous DQN (CDQN or NAF) ... in the loss, which will then result in...
Read more >Learn to Earn: Enabling Coordination Within a Ride-Hailing ...
actions are independent at most times of the day, with coor- dination required only during periodic ... learning based algorithms (cDQN and cA2C)...
Read more >NAN-2013-01342 NYCDEP Gowanus Canal PN.pdf
No permit decision will be made until one of these actions occur. For activities within the coastal zone of New York State, the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@RyanHope @ViktorM @chenjie2001 Hi guys! Hopefully your problems are fixed or you found a workaround. There are two PRs that might fix your NaN issues with CDQN if you still have the original offending code and don’t mind giving them a test.
refactor_dqn
in my fork. https://github.com/matthiasplappert/keras-rl/pull/91I’d be very interested in whether one or both changes have any affect on the NaNs you were getting.
Cheers!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.