question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

average_loss always 0 when using episodic_replay=True (DQN)

See original GitHub issue

Trying this two different q_functions:

(non recurrent)

class QFunction(chainer.Chain, StateQFunction):

        def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
            self.n_actions = n_actions
            self.n_input_channels = n_input_channels
            conv_layers = chainer.ChainList(
                L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
                L.Convolution2D(32, 64, 4, stride=2, bias=bias),
                L.Convolution2D(64, 64, 3, stride=1, bias=bias),
                L.Convolution2D(64, 128, 7, stride=1, bias=bias)
                )

            lin_layer = L.Linear(128, 128)                     

            a_stream = MLP(128,n_actions,[2])
            v_stream = MLP(128,1,[2])

            super().__init__(conv_layers=conv_layers, lin_layer=lin_layer, a_stream=a_stream,v_stream=v_stream)

        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = x
            for l in self.conv_layers:
                h = F.relu(l(h))
            h = self.lin_layer(h)

            batch_size = x.shape[0]
            ya = self.a_stream(h, test=test)
            mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
            ya, mean = F.broadcast(ya,mean)
            ya -= mean

            ys = self.v_stream(h,test=test)
            
            ya,ys = F.broadcast(ya,ys)
            q = ya+ys
            return chainerrl.action_value.DiscreteActionValue(q)


(recurrent)

class QFunctionRecurrent(chainer.Chain, StateQFunction):

        def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
            self.n_actions = n_actions
            self.n_input_channels = n_input_channels
            conv_layers = chainer.ChainList(
                L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
                L.Convolution2D(32, 64, 4, stride=2, bias=bias),
                L.Convolution2D(64, 64, 3, stride=1, bias=bias),
                L.Convolution2D(64, 128, 7, stride=1, bias=bias)
                )

            lstm_layer = L.LSTM(128, 128)                     

            a_stream = MLP(128,n_actions,[2])
            v_stream = MLP(128,1,[2])

            super().__init__(conv_layers=conv_layers, lstm_layer=lstm_layer, a_stream=a_stream,v_stream=v_stream)

        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = x
            for l in self.conv_layers:
                h = F.relu(l(h))
            h = self.lstm_layer(h)

            batch_size = x.shape[0]
            ya = self.a_stream(h, test=test)
            mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
            ya, mean = F.broadcast(ya,mean)
            ya -= mean

            ys = self.v_stream(h,test=test)
            
            ya,ys = F.broadcast(ya,ys)
            q = ya+ys
            return chainerrl.action_value.DiscreteActionValue(q)

I found that for the non-recurrent version the loss is not zero and the agent will eventually master the gym environment provided.

However, changing nothing else than adding an lstm layer and setting episodic_replay to True the average_loss will become 0 all the time and the agents is not able to learn to better interact with its environent.

First, I thought that this was due to some kind of rounding issues so I set the minibatch_size=1, episodic_update_len = 1 (assuming that one episodic replay will now only containg one time step) but still no changes.

I wonder if this is some kind of bug or (which I think is more likely) an error on my side.

Any help is very much appreciated!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
muupancommented, May 3, 2017

Thanks for your code!

Just for clarification:

If episodic_replay=true, then:

minibatch_size corresponds to the episodes used for the experience replay

and

episodic_update_len corresponds to the time steps within each of those episodes, right?

Thus, if one batch has e.g. 50 time steps and episodic_update_len=16 it will draw 16 consecutive time steps from this episodes for replay? Furthermore if episodic_update_len=None it will use all time steps within this episode?

You are correct. minibatch_size is the number of episodes to sample for an update. Each sample episode’s length is at most episodic_update_len.

As for average_loss, it turned out to be a bug in ChainerRL. Losses are computed and the model is updated as usual. However, the value of average_loss is not updated at all when episodic_update=True. I’ll open an issue for it and fix it soon. Thanks for reporting it!

1reaction
muupancommented, Apr 26, 2017

Hi, you need to make sure your model implements chainerrl.recurent.Recurrent interface so that can be treated as a recurrent model. I guess the easiest way to do it is inheriting chainer.recurrent.RecurrentChainMixin like

class QFunctionRecurrent(chainer.Chain, StateQFunction, RecurrentChainMixin):

, which will find L.LSTM by searching recursively in chainer.Chain and chainer.ChainList.

Documentation on the usage of recurrent models is almost missing, so I opened another issue for it #83. Thanks for reporting the issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

DQN Pytorch Loss keeps increasing - Stack Overflow
I have been debugging for a while now, and I cant figure out why the model is not learning. Observations: using SmoothL1Loss performs...
Read more >
Reinforcement Learning (DQN) Tutorial - PyTorch
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent ... and also returns a reward that indicates...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found