question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] MeanStdFilter value problem during compute action

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • Ray installed from (source or binary): source
  • Ray version: 0.5.0
  • Python version: 3.6.1
  • Exact command to reproduce:

Describe the problem

When I run agents have been trained on for evaluation, I found that the agent doesn’t work well like the rewards I’ve monitored.

And I’ve found that when I restore agent, the filter(MeanStdFilter) has somewhat strange values. Have you ever heard of such a problem?

Source code / logs

agent = cls(env="prosthetics", config=config)
agent.restore(checkpoint_path)
print(agent.local_evaluator.filters)
>>> {'default': MeanStdFilter((158,), True, True, None, (n=128033777, mean_mean=-1.0975956375310988e+181, mean_std=inf), (n=0, mean_mean=0.0, mean_std=0.0))}

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:22 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
RodgerLuocommented, Aug 30, 2018

@whikwon @ericl Thanks for all the guidance. So here is what I’ve found: The environment is Pendulum-v0. In agent.compute_action, I log the values of obs before and after the filter. As you will see below, the filtered values are either extremely small or large.

[-0.87094525  0.49138007 -2.64594034]
[-8.70945248e+07  4.91380071e+07 -2.64594034e+08]
 
[-0.81813912  0.57502033 -1.97910756]
[-8.18139124e+07  5.75020325e+07 -1.97910756e+08]
 
[-0.78058041  0.62505538 -1.25146981]
[-7.80580405e+07  6.25055383e+07 -1.25146981e+08]
 
[-0.74561678  0.66637499 -1.08267827]
[-7.45616776e+07  6.66374987e+07 -1.08267827e+08]
 
[-0.71548291  0.69863024 -0.88289703]
[-71548290.6009737   69863023.92595541 -88289703.02494954]
 
[-0.69208157  0.7218193  -0.65892435]
[-69208156.94613262  72181929.95562999 -65892435.08048297]
 
[-0.67686169  0.73611021 -0.41755988]
[-67686169.49385323  73611021.31643993 -41755987.61376046]
 
[-0.67074812  0.74168521 -0.16547722]
[-67074812.32289593  74168521.30013305 -16547721.62643052]
 
[-0.67422883  0.73852251  0.09405967]
[-67422882.58069217  73852250.50403133   9405966.79778121]
 
[-0.68740092  0.72627817  0.35968684]
[-68740091.88581493  72627816.76141532  35968683.70469721]

So it seems like the filter is not applied correctly in the code below:

 filtered_obs = self.local_evaluator.filters[policy_id](
    observation, update=False) 
1reaction
ericlcommented, Aug 22, 2018

You can probably add a print() to determine what update causes it to reach an inf value.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RLlib Algorithms — Ray 0.8.7 documentation
Ape-X using 32 workers in RLlib vs vanilla DQN (orange) and A3C (blue) on ... Instead, gradients are computed remotely on each rollout...
Read more >
Getting Started with RLlib — Ray 2.2.0 - the Ray documentation
Through the algorithm's interface, you can train the policy compute actions, or store your algorithms. In multi-agent training, the algorithm manages the ...
Read more >
ray.rllib.algorithms.ars.ars — Ray 2.2.0 - the Ray documentation
Args: action_noise_std: Std. deviation to be used when adding (standard normal) noise to computed actions. Action noise is only added, if `compute_actions` ...
Read more >
Algorithms — Ray 1.11.0
RLlib's MAML implementation is a meta-learning method for learning and quick adaptation across different tasks for continuous control. Code here is adapted from ......
Read more >
ray.rllib.algorithms.es.es — Ray 2.2.0 - the Ray documentation
Code in this file is copied and adapted from ... Action noise is only added, if `compute_actions` is called with the `add_noise` arg...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found