question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Log actual return, and not only mean return

See original GitHub issue

🚀 Log actual return, and not only mean return

Motivation

When you use Monitor, it is logged (on tensorboard and on terminal) the mean return considering the last 100 episodes. This happens because self.ep_info_buffer is instantiated as a deque(maxlen=100) on BaseAlgorithm class, and on “_dump_logs” method it is logged the mean of the values in this buffer. I think it is more “natural” to log the actual episode return (total sum of rewards on this specific episode), which is what is commonly reported on papers.

Pitch

Log the return (total episode reward) every time an episode ends.

Alternatives

This can be done simply by adding the line below to the method “_dump_logs()” of the class OffPolicyAlgorithm:

 if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
      logger.record("rollout/return", self.ep_info_buffer[-1]["r"])  # ADD THIS LINE
      logger.record("rollout/ep_rew_mean", safe_mean([ep_info["r"] for ep_info in self.ep_info_buffer]))
      logger.record("rollout/ep_len_mean", safe_mean([ep_info["l"] for ep_info in self.ep_info_buffer]))

Additional context

rollout/ep_rew_mean vs actual return (created by me using the modification above).

image

### Checklist

  • I have checked that there is no similar issue in the repo (required)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
araffincommented, Feb 8, 2021

because the goal of RL is to maximize return, and it is what someone would most probably report on a paper.

I would disagree with that statement. First, most of the time, as mentioned in the documentation you should not report the training reward but use a separate environment for periodic evaluation (and deterministic actions, except for Atari games), this is what the EvalCallback is meant for (included in the rl zoo, together with plotting script, see reproducing results). Then, in papers, when training reward is reported, it is usually done using a smoothing window otherwise it is too noisy to be readable. Also, if you want to report the episode return with different windows (or no smoothing window at all) , we save the monitor files and have a plotting script for that in the zoo too.

Another option (maybe better) it would be to enable to change the size of the buffer.

this won’t work as it will also depends on the log interval…

1reaction
araffincommented, Feb 7, 2021

Hello, Please fill in the issue template completely 😉 (including proposed solution and alternatives)

I think it is more “natural” to log the actual episode return, which is what is commonly reported on papers.

I’m not sure what you mean by “actual return”

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to log Feature Requests - Phil Freo
The most common way feature requests are logged is in a “light” format ... “email address only” is still better than not logging...
Read more >
10 Tips for Responding Graciously to Customer Feature ...
Support politely tells them that it can't be done while still providing top quality service. However, the satisfaction survey comes back as bad,...
Read more >
Dealbreakers: How to Handle Feature Requests from Prospects
Are you getting feature requests from prospects? Here's some considerations to remember when handling a dealbreaker request.
Read more >
Creating a Feature Request Process for B2B Product and ...
Nick Paranomos, Co-Founder and VP of Product at 'Nuffsaid, explains how to create a feature request process that works for B2B Product and ......
Read more >
How to say no to product feature requests - Canny Blog
Here's how to say no to feature requests the right way. ... will feel scorned—and they won't come back to you next time...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found