average_loss always 0 when using episodic_replay=True (DQN)
See original GitHub issueTrying this two different q_functions:
(non recurrent)
class QFunction(chainer.Chain, StateQFunction):
def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
self.n_actions = n_actions
self.n_input_channels = n_input_channels
conv_layers = chainer.ChainList(
L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
L.Convolution2D(32, 64, 4, stride=2, bias=bias),
L.Convolution2D(64, 64, 3, stride=1, bias=bias),
L.Convolution2D(64, 128, 7, stride=1, bias=bias)
)
lin_layer = L.Linear(128, 128)
a_stream = MLP(128,n_actions,[2])
v_stream = MLP(128,1,[2])
super().__init__(conv_layers=conv_layers, lin_layer=lin_layer, a_stream=a_stream,v_stream=v_stream)
def __call__(self, x, test=False):
"""
Args:
x (ndarray or chainer.Variable): An observation
test (bool): a flag indicating whether it is in test mode
"""
h = x
for l in self.conv_layers:
h = F.relu(l(h))
h = self.lin_layer(h)
batch_size = x.shape[0]
ya = self.a_stream(h, test=test)
mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
ya, mean = F.broadcast(ya,mean)
ya -= mean
ys = self.v_stream(h,test=test)
ya,ys = F.broadcast(ya,ys)
q = ya+ys
return chainerrl.action_value.DiscreteActionValue(q)
(recurrent)
class QFunctionRecurrent(chainer.Chain, StateQFunction):
def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
self.n_actions = n_actions
self.n_input_channels = n_input_channels
conv_layers = chainer.ChainList(
L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
L.Convolution2D(32, 64, 4, stride=2, bias=bias),
L.Convolution2D(64, 64, 3, stride=1, bias=bias),
L.Convolution2D(64, 128, 7, stride=1, bias=bias)
)
lstm_layer = L.LSTM(128, 128)
a_stream = MLP(128,n_actions,[2])
v_stream = MLP(128,1,[2])
super().__init__(conv_layers=conv_layers, lstm_layer=lstm_layer, a_stream=a_stream,v_stream=v_stream)
def __call__(self, x, test=False):
"""
Args:
x (ndarray or chainer.Variable): An observation
test (bool): a flag indicating whether it is in test mode
"""
h = x
for l in self.conv_layers:
h = F.relu(l(h))
h = self.lstm_layer(h)
batch_size = x.shape[0]
ya = self.a_stream(h, test=test)
mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
ya, mean = F.broadcast(ya,mean)
ya -= mean
ys = self.v_stream(h,test=test)
ya,ys = F.broadcast(ya,ys)
q = ya+ys
return chainerrl.action_value.DiscreteActionValue(q)
I found that for the non-recurrent version the loss is not zero and the agent will eventually master the gym environment provided.
However, changing nothing else than adding an lstm layer and setting episodic_replay to True the average_loss will become 0 all the time and the agents is not able to learn to better interact with its environent.
First, I thought that this was due to some kind of rounding issues so I set the minibatch_size=1, episodic_update_len = 1 (assuming that one episodic replay will now only containg one time step) but still no changes.
I wonder if this is some kind of bug or (which I think is more likely) an error on my side.
Any help is very much appreciated!
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
DQN Pytorch Loss keeps increasing - Stack Overflow
I have been debugging for a while now, and I cant figure out why the model is not learning. Observations: using SmoothL1Loss performs...
Read more >Reinforcement Learning (DQN) Tutorial - PyTorch
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent ... and also returns a reward that indicates...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for your code!
You are correct.
minibatch_size
is the number of episodes to sample for an update. Each sample episode’s length is at mostepisodic_update_len
.As for
average_loss
, it turned out to be a bug in ChainerRL. Losses are computed and the model is updated as usual. However, the value ofaverage_loss
is not updated at all when episodic_update=True. I’ll open an issue for it and fix it soon. Thanks for reporting it!Hi, you need to make sure your model implements
chainerrl.recurent.Recurrent
interface so that can be treated as a recurrent model. I guess the easiest way to do it is inheritingchainer.recurrent.RecurrentChainMixin
like, which will find
L.LSTM
by searching recursively inchainer.Chain
andchainer.ChainList
.Documentation on the usage of recurrent models is almost missing, so I opened another issue for it #83. Thanks for reporting the issue!