Discount not applied in evaluate_policy?
See original GitHub issueMaybe I am missing something here but I feel the line to calculate return: https://github.com/keiohta/tf2rl/blob/82d9eecda78e22021efa0821bf02429ac7827f4d/tf2rl/experiments/trainer.py#L207
should be updated to include the discount factor
for j in range(total_steps):
action = self._policy.get_action(obs, test=True)
next_obs, reward, done, _ = self._test_env.step(action)
avg_test_steps += 1
if self._save_test_path:
replay_buffer.add(obs=obs, act=action,
next_obs=next_obs, rew=reward, done=done)
if self._save_test_movie:
element = self._test_env.render(mode='rgb_array')
frames.append(element)
elif self._show_test_progress:
self._test_env.render()
episode_return += reward * np.power(self._policy.discount, j)
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
LAContext evaluatePolicy not showing TouchID prompt
We are using LAContext evaluatePolicy API to show the TouchID prompt in our App. If we place our finger for authentication, then we...
Read more >LAContext evaluatePolicy does not always prompt user
I like your solution and can confirm it's working. It's fixed with iOS 13.2 - Can you adjust your code to only include...
Read more >Ensure that LAContext evaluatePolicy: reply block is not empty
Using LAContext evaluatePolicy: method provides a callback reply block reply:(void (^)(BOOL success, NSError *error))reply. This block cannot stay empty, ...
Read more >Question about documentation - iOS fingerprint bypass #136
An important note to make here is that [LAContext evaluatePolicy:localizedReason:reply:] is not used to access keychain items directly, ...
Read more >Troubleshoot common errors - Azure Policy | Microsoft Learn
If you're working with a custom policy, go to the Azure portal to get ... A resource is in the Not Started state,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree with @ymd-h that the evaluation score does not include the discount factor. I think the reason why the DDQN paper reports the discounted return is to evaluate the overestimation phenomenon: since the Q-network produces the estimated discounted cumulative rewards, the “true” return should be computed with the discount factor. I don’t think other paper reports discounted return.
@naji-s
Although the paper (maybe) doesn’t describe the definition, I think the plots show non-discounted rewards by using models trained with discounted rewards.
As long as discount factor (
gamma
) is fixed (andn
-step is fixed), you can use the discounted reward for model comparison, but it is not universal metric. In order to improve model performance, we try to tune discount factor, so that the metric itself should be independent of discount factor.