[rllib] MeanStdFilter value problem during compute action
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- Ray installed from (source or binary): source
- Ray version: 0.5.0
- Python version: 3.6.1
- Exact command to reproduce:
Describe the problem
When I run agents have been trained on for evaluation, I found that the agent doesn’t work well like the rewards I’ve monitored.
And I’ve found that when I restore agent, the filter(MeanStdFilter
) has somewhat strange values.
Have you ever heard of such a problem?
Source code / logs
agent = cls(env="prosthetics", config=config)
agent.restore(checkpoint_path)
print(agent.local_evaluator.filters)
>>> {'default': MeanStdFilter((158,), True, True, None, (n=128033777, mean_mean=-1.0975956375310988e+181, mean_std=inf), (n=0, mean_mean=0.0, mean_std=0.0))}
Issue Analytics
- State:
- Created 5 years ago
- Comments:22 (11 by maintainers)
Top Results From Across the Web
RLlib Algorithms — Ray 0.8.7 documentation
Ape-X using 32 workers in RLlib vs vanilla DQN (orange) and A3C (blue) on ... Instead, gradients are computed remotely on each rollout...
Read more >Getting Started with RLlib — Ray 2.2.0 - the Ray documentation
Through the algorithm's interface, you can train the policy compute actions, or store your algorithms. In multi-agent training, the algorithm manages the ...
Read more >ray.rllib.algorithms.ars.ars — Ray 2.2.0 - the Ray documentation
Args: action_noise_std: Std. deviation to be used when adding (standard normal) noise to computed actions. Action noise is only added, if `compute_actions` ...
Read more >Algorithms — Ray 1.11.0
RLlib's MAML implementation is a meta-learning method for learning and quick adaptation across different tasks for continuous control. Code here is adapted from ......
Read more >ray.rllib.algorithms.es.es — Ray 2.2.0 - the Ray documentation
Code in this file is copied and adapted from ... Action noise is only added, if `compute_actions` is called with the `add_noise` arg...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@whikwon @ericl Thanks for all the guidance. So here is what I’ve found: The environment is Pendulum-v0. In agent.compute_action, I log the values of obs before and after the filter. As you will see below, the filtered values are either extremely small or large.
So it seems like the filter is not applied correctly in the code below:
You can probably add a print() to determine what update causes it to reach an inf value.