Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Eager execution is failed with RNN

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Ray installed from (source or binary): Binary
Ray version: 0.7.6
Python version: 3.7.5
Exact command to reproduce: Run rllib/examples/custom_keras_rnn_model.py after adding eager: True in tune config.

Describe the problem

I wanted to test TF eager execution with rllib/examples/custom_keras_rnn_model.py, but failed. The assertion in make_tf_callable() is failed because tf.executing_eagerly() returns False even on the eager mode. After some debugging, I found out that tf.executing_eagerly() starts to work wrong after executing rllib/models/catalog.py:258, which accesses tune.registry._global_registry. However, this situation does not occur without RNN, for example when running rllib/examples/custom_keras_model.py.

Source code / logs

2019-12-09 10:17:13,116	ERROR trial_runner.py:569 -- Error processing event.
Traceback (most recent call last):
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AssertionError): ray_PPO:train() (pid=16838, host=daewoo-linux)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 372, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/tune/trainable.py", line 96, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 492, in _setup
    self._init(self.config, self.env_creator)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 109, in _init
    self.config["num_workers"])
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 537, in _make_workers
    logdir=self.logdir)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 64, in __init__
    RolloutWorker, env_creator, policy, 0, self._local_config)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 220, in _make_worker
    _fake_sampler=config.get("_fake_sampler", False))
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 351, in __init__
    policy_dict, policy_config)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 764, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/policy/eager_tf_policy.py", line 244, in __init__
    before_loss_init(self, observation_space, action_space, config)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/ppo/ppo_policy.py", line 267, in setup_mixins
    ValueNetworkMixin.__init__(policy, obs_space, action_space, config)
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/agents/ppo/ppo_policy.py", line 239, in __init__
    @make_tf_callable(self.get_session())
  File "/home/neigh/miniconda3/envs/ray/lib/python3.7/site-packages/ray/rllib/utils/tf_ops.py", line 58, in make_tf_callable
    assert session_or_none is not None

Issue Analytics

State:
Created 4 years ago
Comments:9 (7 by maintainers)

Top GitHub Comments

1reaction

sven1977commented, Dec 14, 2019

It seems to work fine when we don’t use tune, like e.g.:

if __name__ == "__main__":
    ray.init()
    args = parser.parse_args()
    ModelCatalog.register_custom_model("rnn", MyKerasRNN)
    register_env("RepeatAfterMeEnv", lambda c: RepeatAfterMeEnv(c))
    register_env("RepeatInitialEnv", lambda _: RepeatInitialEnv())

    trainer = PPOTrainer(config={
        "eager": True,
        "env": args.env,
        "env_config": {
            "repeat_delay": 2,
        },
        "gamma": 0.9,
        "num_workers": 0,
        "num_envs_per_worker": 20,
        "entropy_coeff": 0.001,
        "num_sgd_iter": 5,
        "vf_loss_coeff": 1e-5,
        "model": {
            "custom_model": "rnn",
            "max_seq_len": 20,
        },
    }, env=args.env)

    trainer.train()

0reactions

stale[bot]commented, Jan 23, 2021

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you’d still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray’s public slack channel.

Thanks again for opening the issue!

Top Results From Across the Web

Tensorflow 2 eager execution disabled inside a custom layer

And I'm playing with a custom layer. import tensorflow as tf from tensorflow.keras.preprocessing import sequence from tensorflow.keras.layers ...

Text generation using a RNN with eager execution - Kaggle

Text generation using a RNN with eager execution. Python · Shakespeare ... An error occurred: Failed to fetch. navigate_nextminimize.

tf.compat.v1.enable_eager_execution | TensorFlow v2.11.0

Eager execution cannot be enabled after TensorFlow APIs have been used to create or execute graphs. It is typically recommended to invoke this ......

TensorFlow Eager Execution v.s. Graph (@tf.function)

This can be error-prone during deployment, in particular for NLP problems. Graph Mode Catches. However, there is a major catch for graph mode....

Inputs to eager execution function cannot be Keras symbolic ...

keras with TensorFlow 2.0. Below is my code This is working with TensorFlow 1.15 but getting the error in 2.0. you can check...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Eager execution is failed with RNN

System information

Describe the problem

Source code / logs

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[rllib] APEX DQN performance regression?

Exception when using MultiDiscrete action spaces