question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem while using the code

See original GitHub issue

Hello @miyosuda,

Thanks for sharing the code, please ignore the title, I tried out your code with the control problem of cartpole balance experiment instead of Atari game, it works well. But few questions want to ask.

I am curious, in the asynchronous paper, they also used another model implementation with 1 linear layer, 1 LSTM, layer, and softmax output, I am thinking of using this model to see whether improve the result, can you suggest how the LSTM can be implemented using tensorflow in the case of playing atari game?

Also wondering that the accumulated states and reward were reversed, do you need to reverse the actions and values as well? Although it did not make any different when I tried out, just wondering why.

states.reverse()
rewards.reverse()

Last, do you really need to accumulate the gradient and then apply the update, since tensorflow can handle the ‘batch’ for update.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:78 (36 by maintainers)

github_iconTop GitHub Comments

3reactions
muupancommented, May 10, 2016

@miyosuda For everyone’s information, I summarized about their settings here: https://github.com/muupan/async-rl/wiki

1reaction
originholiccommented, Apr 15, 2016

Many thanks for your reply, and glad to hear that you also plan to work on the LSTM model.

I just uploaded the testing codes (based on this repo) for the “batch” update that mentioned. https://github.com/originholic/a3c_vrep.git

I only tested it with the cartpole balance domain, but somehow I found it actually take longer time to reach the desired score than your implementation. I will try to investigate this later, as now I will continue to work with your implementation to study the LSTM model, which I am not familiar with.

Also Instead of constant learning rate:

math.exp( log_lo * ( 1-rate ) + log_hi * rate)

Don’t know whether the random initialization of learning rate that mentioned in the paper can help to improve the results:

math.exp( random.uniform( log_lo, log_hi ) )
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Solve Coding Problems with a Simple Four Step Method
In this article, I'll go over this four-step problem-solving method that you can use to start confidently solving coding problems.
Read more >
Coding problems? Learn what to do when you're stuck
7 tips for getting unstuck when programming · Don't panic. First things first: don't panic. · Read the error. Take the time to...
Read more >
Troubleshooting your code - Coding: a Practical Guide
On this page we've collected together some tips and suggestions for when you're having problems coding: when your code doesn't work or you...
Read more >
The 5 Most Common Problems New Programmers Face
Getting set up · Thinking Like a Programmer · More videos · More videos on YouTube · Compiler Error Messages · Debugging ·...
Read more >
How can a programmer find a problem in code? - Quora
Basically you look for a block of code where the variables of your program have their expected values before you get to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found