Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MARWIL error

See original GitHub issue

System information

Ubuntu 18.04 LTS:
Ray installed from (pip install Ray):
0.6.3:
Python 3.6:
Exact command to reproduce:

Describe the problem

register_env("my_env", env_creator)
ModelCatalog.register_custom_model("pa_model", ParametricActionsModel)

config['model']["custom_model"] = "pa_model"
......
agent = MARWILAgent(config=config, env="my_env")

Error could not be identified, there are really too few documentations for such cases. Examples and codes for custom model + the right setting are not sufficient. Some codes example for MARWIL with custom LSTM and configuration is needed.

Source code / logs

Traceback (most recent call last):
  File "/home/llu/c7_triangle/train_MARWIL.py", line 128, in <module>
    agent = MARWILAgent(config=config, env="my_env")
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 257, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 333, in _setup
    self._init()
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil.py", line 49, in _init
    self.env_creator, self._policy_graph)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 466, in make_local_evaluator
    extra_config or {}))
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 608, in _make_evaluator
    output_creator=output_creator)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 274, in __init__
    self._build_policy_map(policy_dict, policy_config)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 611, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil_policy_graph.py", line 69, in __init__
    self.obs_t, observation_space, logit_dim)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil_policy_graph.py", line 127, in _build_policy_network
    }, obs_space, logit_dim, self.config["model"])
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/catalog.py", line 198, in get_model
    options, state_in, seq_lens)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/catalog.py", line 227, in _get_model
    seq_lens=seq_lens)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/model.py", line 74, in __init__
    num_outputs, options)
  File "/home/llu/c7_triangle/train_MARWIL.py", line 32, in _build_layers_v2
    num_outputs, avail_actions)
ValueError: ('This model assumes num outputs is equal to max avail actions', 42, <tf.Tensor 'default/p_func/Reshape_1:0' shape=(?, 21, 21) dtype=float32>)
Exception ignored in: <bound method PolicyEvaluator.__del__ of <ray.rllib.evaluation.policy_evaluator.PolicyEvaluator object at 0x7f7b147c0c88>>
Traceback (most recent call last):
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 615, in __del__
    if isinstance(self.sampler, AsyncSampler):
AttributeError: 'PolicyEvaluator' object has no attribute 'sampler'

Issue Analytics

State:
Created 5 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

ericlcommented, Feb 13, 2019

I would try PPO or DDPG. As for the sample prioritization, it would probably be better to just increase the batch size in PPO to avoid introducing bias, though DDPG/Apex DDPG has importance weight replay prioritization as an option (it might be even the default)?

On Wed, Feb 13, 2019, 1:55 AM aGiant notifications@github.com wrote:

Samples of negative returns are much more than positive returns, which agent should mostly learn from positive returns or do nothing. Negative return should play not important rolls but had too many samples which break the balances of negative and positive. Choosing and learning mostly from positive samples would be better for training. Questions: which algo in Ray should I choose for high dimensional continuous actions? Or, how can I build the custom process for that purpose? Is there any examples to follow? Many thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/4026#issuecomment-463133930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SoIfq7vNM7_HWJCh12LG3AQa71Lvks5vM-EUgaJpZM4a4A9m .

0reactions

aGiantcommented, Feb 13, 2019

Samples of negative returns are much more than positive returns, which agent should mostly learn from positive returns or do nothing. Negative return should play not important rolls but had too many samples which break the balances of negative and positive. Choosing and learning mostly from positive samples would be better for training. Questions: which algo in Ray should I choose for high dimensional continuous actions? Or, how can I build the custom process for that purpose? Is there any examples to follow? Many thanks!

Top Results From Across the Web

Marwil : Postprocessing of multi-agent data not implemented yet

In MARWIL, it is said on main page that multi-agent is s… ... For now, I replaced locally the error with “return batch”...

Exponentially Weighted Imitation Learning for Batched ...

problem, we propose a monotonic advantage reweighted imitation learning strategy ... Algorithm 1 Monotonic Advantage Re-Weighted Imitation Learning (MARWIL).

GAI MAKIR MARWIL v. ATTORNEY GENERAL | FindLaw

Case opinion for US 11th Circuit GAI MAKIR MARWIL v. ... The BIA found “no error in the [IJ's] determination that [Makir–Marwil's] convictions...

About Marwil - Community

by Marwil in InstallAnywhere Forum ... This works fine in 32 bit (windows 7 x86) but when run in 64 bit i got...

Letter on the Wind, Pre-Owned Hardcover 1932425748 ...

Arrives by Thu, Dec 29 Buy Letter on the Wind, Pre-Owned Hardcover 1932425748 9781932425741 Sarah Marwil Lamstein at Walmart.com.