Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem while using the code

See original GitHub issue

Hello @miyosuda,

Thanks for sharing the code, please ignore the title, I tried out your code with the control problem of cartpole balance experiment instead of Atari game, it works well. But few questions want to ask.

I am curious, in the asynchronous paper, they also used another model implementation with 1 linear layer, 1 LSTM, layer, and softmax output, I am thinking of using this model to see whether improve the result, can you suggest how the LSTM can be implemented using tensorflow in the case of playing atari game?

Also wondering that the accumulated states and reward were reversed, do you need to reverse the actions and values as well? Although it did not make any different when I tried out, just wondering why.

states.reverse()
rewards.reverse()

Last, do you really need to accumulate the gradient and then apply the update, since tensorflow can handle the ‘batch’ for update.

Issue Analytics

State:
Created 7 years ago
Comments:78 (36 by maintainers)

Top GitHub Comments

3reactions

muupancommented, May 10, 2016

@miyosuda For everyone’s information, I summarized about their settings here: https://github.com/muupan/async-rl/wiki

1reaction

originholiccommented, Apr 15, 2016

Many thanks for your reply, and glad to hear that you also plan to work on the LSTM model.

I just uploaded the testing codes (based on this repo) for the “batch” update that mentioned. https://github.com/originholic/a3c_vrep.git

I only tested it with the cartpole balance domain, but somehow I found it actually take longer time to reach the desired score than your implementation. I will try to investigate this later, as now I will continue to work with your implementation to study the LSTM model, which I am not familiar with.

Also Instead of constant learning rate:

math.exp( log_lo * ( 1-rate ) + log_hi * rate)

Don’t know whether the random initialization of learning rate that mentioned in the paper can help to improve the results:

math.exp( random.uniform( log_lo, log_hi ) )

Top Results From Across the Web

How to Solve Coding Problems with a Simple Four Step Method

In this article, I'll go over this four-step problem-solving method that you can use to start confidently solving coding problems.

Coding problems? Learn what to do when you're stuck

7 tips for getting unstuck when programming · Don't panic. First things first: don't panic. · Read the error. Take the time to...

Troubleshooting your code - Coding: a Practical Guide

On this page we've collected together some tips and suggestions for when you're having problems coding: when your code doesn't work or you...

The 5 Most Common Problems New Programmers Face

Getting set up · Thinking Like a Programmer · More videos · More videos on YouTube · Compiler Error Messages · Debugging ·...

How can a programmer find a problem in code? - Quora

Basically you look for a block of code where the variables of your program have their expected values before you get to the...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Problem while using the code

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

allow `{% attrs %}` to render hyphenated attributes ( `aria-label`, `data-script` etc)

0.8.2: pytest is failing in `test/test_contrib/test_github_wiki.py::TestGithubWiki::test_parse` unit and `DeprecationWarning` warnings