Problem while using the code
See original GitHub issueHello @miyosuda,
Thanks for sharing the code, please ignore the title, I tried out your code with the control problem of cartpole balance experiment instead of Atari game, it works well. But few questions want to ask.
I am curious, in the asynchronous paper, they also used another model implementation with 1 linear layer, 1 LSTM, layer, and softmax output, I am thinking of using this model to see whether improve the result, can you suggest how the LSTM can be implemented using tensorflow in the case of playing atari game?
Also wondering that the accumulated states and reward were reversed, do you need to reverse the actions and values as well? Although it did not make any different when I tried out, just wondering why.
states.reverse()
rewards.reverse()
Last, do you really need to accumulate the gradient and then apply the update, since tensorflow can handle the ‘batch’ for update.
Issue Analytics
- State:
- Created 7 years ago
- Comments:78 (36 by maintainers)
@miyosuda For everyone’s information, I summarized about their settings here: https://github.com/muupan/async-rl/wiki
Many thanks for your reply, and glad to hear that you also plan to work on the LSTM model.
I just uploaded the testing codes (based on this repo) for the “batch” update that mentioned. https://github.com/originholic/a3c_vrep.git
I only tested it with the cartpole balance domain, but somehow I found it actually take longer time to reach the desired score than your implementation. I will try to investigate this later, as now I will continue to work with your implementation to study the LSTM model, which I am not familiar with.
Also Instead of constant learning rate:
Don’t know whether the random initialization of learning rate that mentioned in the paper can help to improve the results: