question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Testing should be not deterministic

See original GitHub issue

There is a parameter --evaluation-episodes but in the current implementation, like we are always acting greedly, all the episodes are going to be exactly the same. I think that to get a better testing evaluation, you should add a deterministic=False when you are testing (i.e. in stead of taking the action with the higher Q value, you can sample on all the action with each Q value as the probability) .

I implemented that on my branch on the last commit marintoro@d061caf1c5abcb0818c1e8966120ce3b86b875da (it’s really straightforward)

Btw I launched a training last night, everything worked properly. But I don’t have access to a powerfull computer yet so the agent was still pretty poor in performance (in the early stage of training). I just wanted to know if you already launched a big training, on which game and if you compared it to a standard DRL algo (like simple DQN for example)? Because there may still be some non-breaking errors in the implementation which could be sneaky to spot and debug (I mean if the agent is learning worse than simple DQN, there must be something wrong for example).

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Kaixhincommented, Jan 19, 2018

Looks reasonable so far. The Q-values increased rapidly, and have now stabilised (looking very similar to the values of the Double DQN).

newplot 1

The reward itself is clearly increasing (noisily, but at a reasonable level - not one at which I’d say there’s definitely a problem). It’s pretty much at the level of a trained Double DQN at about 1/3 of the training steps - but of course according to the Rainbow paper the score only really takes off after the halfway mark (and even then many runs may not work out so well, so even if this fails after the full run it’s unfortunately not conclusive).

newplot

1reaction
marintorocommented, Jan 17, 2018

Ok. On my side I will launch a training on Breakout for a little sanity check. Cause I think it’s easier to see if the agent really learned something or just play randomly, indeed in Space Invaders it’s pretty easy to be convinced that a full random agent is playing pretty well ^^

Concerning the sampling via Q-values or just taking new weights for the Noisy layers, I really don’t know, we should maybe try both to compare (Q-values sampling may lead to way too much exploration but on the other hand in the late stage of training, the agent may have learn to ignore all incoming noise from the Noisy layers…).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Testing: deterministic or non-deterministic?
The first is deterministic: every run of the test for the same revision of code should yield the same result. The second is...
Read more >
Seven Recommendations for Testing in a Non-Deterministic ...
Testing for non-deterministic defects often involves the performance of specialty engineering types of testing. Before you can test for the ...
Read more >
How to create deterministic tests -
Tests that are not deterministic are not only useless, but they are also harmful. They have capacity to mislead us.
Read more >
Non-determinism in tests - Enterprise Craftsmanship
Non-determinism basically stands for flickering tests. These are tests that pass most of the time but fail once in a while and then...
Read more >
Testing non-deterministic code - HitchDev
Non-deterministic code is code which can produce different outputs even when it is given the same inputs. For example: a program that is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found