[Feature Request] Log actual return, and not only mean return
See original GitHub issue🚀 Log actual return, and not only mean return
Motivation
When you use Monitor, it is logged (on tensorboard and on terminal) the mean return considering the last 100 episodes. This happens because self.ep_info_buffer is instantiated as a deque(maxlen=100) on BaseAlgorithm class, and on “_dump_logs” method it is logged the mean of the values in this buffer. I think it is more “natural” to log the actual episode return (total sum of rewards on this specific episode), which is what is commonly reported on papers.
Pitch
Log the return (total episode reward) every time an episode ends.
Alternatives
This can be done simply by adding the line below to the method “_dump_logs()” of the class OffPolicyAlgorithm:
if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
logger.record("rollout/return", self.ep_info_buffer[-1]["r"]) # ADD THIS LINE
logger.record("rollout/ep_rew_mean", safe_mean([ep_info["r"] for ep_info in self.ep_info_buffer]))
logger.record("rollout/ep_len_mean", safe_mean([ep_info["l"] for ep_info in self.ep_info_buffer]))
Additional context
rollout/ep_rew_mean vs actual return (created by me using the modification above).
### Checklist
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
How to log Feature Requests - Phil Freo
The most common way feature requests are logged is in a “light” format ... “email address only” is still better than not logging...
Read more >10 Tips for Responding Graciously to Customer Feature ...
Support politely tells them that it can't be done while still providing top quality service. However, the satisfaction survey comes back as bad,...
Read more >Dealbreakers: How to Handle Feature Requests from Prospects
Are you getting feature requests from prospects? Here's some considerations to remember when handling a dealbreaker request.
Read more >Creating a Feature Request Process for B2B Product and ...
Nick Paranomos, Co-Founder and VP of Product at 'Nuffsaid, explains how to create a feature request process that works for B2B Product and ......
Read more >How to say no to product feature requests - Canny Blog
Here's how to say no to feature requests the right way. ... will feel scorned—and they won't come back to you next time...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would disagree with that statement. First, most of the time, as mentioned in the documentation you should not report the training reward but use a separate environment for periodic evaluation (and deterministic actions, except for Atari games), this is what the
EvalCallback
is meant for (included in the rl zoo, together with plotting script, see reproducing results). Then, in papers, when training reward is reported, it is usually done using a smoothing window otherwise it is too noisy to be readable. Also, if you want to report the episode return with different windows (or no smoothing window at all) , we save the monitor files and have a plotting script for that in the zoo too.this won’t work as it will also depends on the log interval…
Hello, Please fill in the issue template completely 😉 (including proposed solution and alternatives)
I’m not sure what you mean by “actual return”