Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] MeanStdFilter value problem during compute action

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
Ray installed from (source or binary): source
Ray version: 0.5.0
Python version: 3.6.1
Exact command to reproduce:

Describe the problem

When I run agents have been trained on for evaluation, I found that the agent doesn’t work well like the rewards I’ve monitored.

And I’ve found that when I restore agent, the filter(MeanStdFilter) has somewhat strange values. Have you ever heard of such a problem?

Source code / logs

agent = cls(env="prosthetics", config=config)
agent.restore(checkpoint_path)
print(agent.local_evaluator.filters)
>>> {'default': MeanStdFilter((158,), True, True, None, (n=128033777, mean_mean=-1.0975956375310988e+181, mean_std=inf), (n=0, mean_mean=0.0, mean_std=0.0))}

Issue Analytics

State:
Created 5 years ago
Comments:22 (11 by maintainers)

Top GitHub Comments

1reaction

RodgerLuocommented, Aug 30, 2018

@whikwon @ericl Thanks for all the guidance. So here is what I’ve found: The environment is Pendulum-v0. In agent.compute_action, I log the values of obs before and after the filter. As you will see below, the filtered values are either extremely small or large.

[-0.87094525  0.49138007 -2.64594034]
[-8.70945248e+07  4.91380071e+07 -2.64594034e+08]
 
[-0.81813912  0.57502033 -1.97910756]
[-8.18139124e+07  5.75020325e+07 -1.97910756e+08]
 
[-0.78058041  0.62505538 -1.25146981]
[-7.80580405e+07  6.25055383e+07 -1.25146981e+08]
 
[-0.74561678  0.66637499 -1.08267827]
[-7.45616776e+07  6.66374987e+07 -1.08267827e+08]
 
[-0.71548291  0.69863024 -0.88289703]
[-71548290.6009737   69863023.92595541 -88289703.02494954]
 
[-0.69208157  0.7218193  -0.65892435]
[-69208156.94613262  72181929.95562999 -65892435.08048297]
 
[-0.67686169  0.73611021 -0.41755988]
[-67686169.49385323  73611021.31643993 -41755987.61376046]
 
[-0.67074812  0.74168521 -0.16547722]
[-67074812.32289593  74168521.30013305 -16547721.62643052]
 
[-0.67422883  0.73852251  0.09405967]
[-67422882.58069217  73852250.50403133   9405966.79778121]
 
[-0.68740092  0.72627817  0.35968684]
[-68740091.88581493  72627816.76141532  35968683.70469721]

So it seems like the filter is not applied correctly in the code below:

 filtered_obs = self.local_evaluator.filters[policy_id](
    observation, update=False)

1reaction

ericlcommented, Aug 22, 2018

You can probably add a print() to determine what update causes it to reach an inf value.

Top Results From Across the Web

RLlib Algorithms — Ray 0.8.7 documentation

Ape-X using 32 workers in RLlib vs vanilla DQN (orange) and A3C (blue) on ... Instead, gradients are computed remotely on each rollout...

Getting Started with RLlib — Ray 2.2.0 - the Ray documentation

Through the algorithm's interface, you can train the policy compute actions, or store your algorithms. In multi-agent training, the algorithm manages the ...

ray.rllib.algorithms.ars.ars — Ray 2.2.0 - the Ray documentation

Args: action_noise_std: Std. deviation to be used when adding (standard normal) noise to computed actions. Action noise is only added, if `compute_actions` ...

Algorithms — Ray 1.11.0

RLlib's MAML implementation is a meta-learning method for learning and quick adaptation across different tasks for continuous control. Code here is adapted from ......

ray.rllib.algorithms.es.es — Ray 2.2.0 - the Ray documentation

Code in this file is copied and adapted from ... Action noise is only added, if `compute_actions` is called with the `add_noise` arg...