MARWIL error
See original GitHub issueSystem information
- Ubuntu 18.04 LTS:
- Ray installed from (pip install Ray):
- 0.6.3:
- Python 3.6:
- Exact command to reproduce:
Describe the problem
register_env("my_env", env_creator)
ModelCatalog.register_custom_model("pa_model", ParametricActionsModel)
config['model']["custom_model"] = "pa_model"
......
agent = MARWILAgent(config=config, env="my_env")
Error could not be identified, there are really too few documentations for such cases. Examples and codes for custom model + the right setting are not sufficient. Some codes example for MARWIL with custom LSTM and configuration is needed.
Source code / logs
Traceback (most recent call last):
File "/home/llu/c7_triangle/train_MARWIL.py", line 128, in <module>
agent = MARWILAgent(config=config, env="my_env")
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 257, in __init__
Trainable.__init__(self, config, logger_creator)
File "/home/llu/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 88, in __init__
self._setup(copy.deepcopy(self.config))
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 333, in _setup
self._init()
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil.py", line 49, in _init
self.env_creator, self._policy_graph)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 466, in make_local_evaluator
extra_config or {}))
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/agent.py", line 608, in _make_evaluator
output_creator=output_creator)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 274, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 611, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil_policy_graph.py", line 69, in __init__
self.obs_t, observation_space, logit_dim)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/agents/marwil/marwil_policy_graph.py", line 127, in _build_policy_network
}, obs_space, logit_dim, self.config["model"])
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/catalog.py", line 198, in get_model
options, state_in, seq_lens)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/catalog.py", line 227, in _get_model
seq_lens=seq_lens)
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/models/model.py", line 74, in __init__
num_outputs, options)
File "/home/llu/c7_triangle/train_MARWIL.py", line 32, in _build_layers_v2
num_outputs, avail_actions)
ValueError: ('This model assumes num outputs is equal to max avail actions', 42, <tf.Tensor 'default/p_func/Reshape_1:0' shape=(?, 21, 21) dtype=float32>)
Exception ignored in: <bound method PolicyEvaluator.__del__ of <ray.rllib.evaluation.policy_evaluator.PolicyEvaluator object at 0x7f7b147c0c88>>
Traceback (most recent call last):
File "/home/llu/.local/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 615, in __del__
if isinstance(self.sampler, AsyncSampler):
AttributeError: 'PolicyEvaluator' object has no attribute 'sampler'
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Marwil : Postprocessing of multi-agent data not implemented yet
In MARWIL, it is said on main page that multi-agent is s… ... For now, I replaced locally the error with “return batch”...
Read more >Exponentially Weighted Imitation Learning for Batched ...
problem, we propose a monotonic advantage reweighted imitation learning strategy ... Algorithm 1 Monotonic Advantage Re-Weighted Imitation Learning (MARWIL).
Read more >GAI MAKIR MARWIL v. ATTORNEY GENERAL | FindLaw
Case opinion for US 11th Circuit GAI MAKIR MARWIL v. ... The BIA found “no error in the [IJ's] determination that [Makir–Marwil's] convictions...
Read more >About Marwil - Community
by Marwil in InstallAnywhere Forum ... This works fine in 32 bit (windows 7 x86) but when run in 64 bit i got...
Read more >Letter on the Wind, Pre-Owned Hardcover 1932425748 ...
Arrives by Thu, Dec 29 Buy Letter on the Wind, Pre-Owned Hardcover 1932425748 9781932425741 Sarah Marwil Lamstein at Walmart.com.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would try PPO or DDPG. As for the sample prioritization, it would probably be better to just increase the batch size in PPO to avoid introducing bias, though DDPG/Apex DDPG has importance weight replay prioritization as an option (it might be even the default)?
On Wed, Feb 13, 2019, 1:55 AM aGiant notifications@github.com wrote:
Samples of negative returns are much more than positive returns, which agent should mostly learn from positive returns or do nothing. Negative return should play not important rolls but had too many samples which break the balances of negative and positive. Choosing and learning mostly from positive samples would be better for training. Questions: which algo in Ray should I choose for high dimensional continuous actions? Or, how can I build the custom process for that purpose? Is there any examples to follow? Many thanks!