question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

System information

  • Ubuntu 18.04 LTS:
  • Ray installed from (pip install Ray):
  • 0.6.3:
  • Python 3.6:
  • Exact command to reproduce:

Describe the problem

register_env("my_env", env_creator)
ModelCatalog.register_custom_model("pa_model", ParametricActionsModel)

config['model']["custom_model"] = "pa_model"
......
agent = MARWILAgent(config=config, env="my_env")

Error could not be identified, there are really too few documentations for such cases. Examples and codes for custom model + the right setting are not sufficient. Some codes example for MARWIL with custom LSTM and configuration is needed.

Source code / logs

Traceback (most recent call last):
  File "/home/llu/c7_triangle/train_MARWIL.py", line 128, in <module>
    agent = MARWILAgent(config=config, env="my_env")
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 257, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 333, in _setup
    self._init()
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil.py", line 49, in _init
    self.env_creator, self._policy_graph)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 466, in make_local_evaluator
    extra_config or {}))
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 608, in _make_evaluator
    output_creator=output_creator)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 274, in __init__
    self._build_policy_map(policy_dict, policy_config)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 611, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil_policy_graph.py", line 69, in __init__
    self.obs_t, observation_space, logit_dim)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil_policy_graph.py", line 127, in _build_policy_network
    }, obs_space, logit_dim, self.config["model"])
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/catalog.py", line 198, in get_model
    options, state_in, seq_lens)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/catalog.py", line 227, in _get_model
    seq_lens=seq_lens)
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/model.py", line 74, in __init__
    num_outputs, options)
  File "/home/llu/c7_triangle/train_MARWIL.py", line 32, in _build_layers_v2
    num_outputs, avail_actions)
ValueError: ('This model assumes num outputs is equal to max avail actions', 42, <tf.Tensor 'default/p_func/Reshape_1:0' shape=(?, 21, 21) dtype=float32>)
Exception ignored in: <bound method PolicyEvaluator.__del__ of <ray.rllib.evaluation.policy_evaluator.PolicyEvaluator object at 0x7f7b147c0c88>>
Traceback (most recent call last):
  File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 615, in __del__
    if isinstance(self.sampler, AsyncSampler):
AttributeError: 'PolicyEvaluator' object has no attribute 'sampler'

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ericlcommented, Feb 13, 2019

I would try PPO or DDPG. As for the sample prioritization, it would probably be better to just increase the batch size in PPO to avoid introducing bias, though DDPG/Apex DDPG has importance weight replay prioritization as an option (it might be even the default)?

On Wed, Feb 13, 2019, 1:55 AM aGiant notifications@github.com wrote:

Samples of negative returns are much more than positive returns, which agent should mostly learn from positive returns or do nothing. Negative return should play not important rolls but had too many samples which break the balances of negative and positive. Choosing and learning mostly from positive samples would be better for training. Questions: which algo in Ray should I choose for high dimensional continuous actions? Or, how can I build the custom process for that purpose? Is there any examples to follow? Many thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/4026#issuecomment-463133930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SoIfq7vNM7_HWJCh12LG3AQa71Lvks5vM-EUgaJpZM4a4A9m .

0reactions
aGiantcommented, Feb 13, 2019

Samples of negative returns are much more than positive returns, which agent should mostly learn from positive returns or do nothing. Negative return should play not important rolls but had too many samples which break the balances of negative and positive. Choosing and learning mostly from positive samples would be better for training. Questions: which algo in Ray should I choose for high dimensional continuous actions? Or, how can I build the custom process for that purpose? Is there any examples to follow? Many thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Marwil : Postprocessing of multi-agent data not implemented yet
In MARWIL, it is said on main page that multi-agent is s… ... For now, I replaced locally the error with “return batch”...
Read more >
Exponentially Weighted Imitation Learning for Batched ...
problem, we propose a monotonic advantage reweighted imitation learning strategy ... Algorithm 1 Monotonic Advantage Re-Weighted Imitation Learning (MARWIL).
Read more >
GAI MAKIR MARWIL v. ATTORNEY GENERAL | FindLaw
Case opinion for US 11th Circuit GAI MAKIR MARWIL v. ... The BIA found “no error in the [IJ's] determination that [Makir–Marwil's] convictions...
Read more >
About Marwil - Community
by Marwil in InstallAnywhere Forum ... This works fine in 32 bit (windows 7 x86) but when run in 64 bit i got...
Read more >
Letter on the Wind, Pre-Owned Hardcover 1932425748 ...
Arrives by Thu, Dec 29 Buy Letter on the Wind, Pre-Owned Hardcover 1932425748 9781932425741 Sarah Marwil Lamstein at Walmart.com.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found