Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Rllib episode handling with concurrent episodes

See original GitHub issue

Search before asking

I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

In my project, I use ExternalEnv to train a policy. Due to practical reasons, I need to create an ‘action episode’ to continuously get actions. Meanwhile, separate episodes (‘log episodes’) are started to log actions and returns, then terminated. An error happens after a ‘log episode’ is ended, and get_action API is called using the ‘action episode’.

Error message: — Logging error — Traceback (most recent call last): File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/env/policy_client.py”, line 273, in run samples = self.rollout_worker.sample() File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 757, in sample batches = [self.input_reader.next()] File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 103, in next batches = [self.get_data()] File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 265, in get_data item = next(self._env_runner) File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 656, in _env_runner eval_results = _do_policy_eval( File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 1068, in _do_policy_eval input_dict = sample_collector.get_inference_input_dict(policy_id) File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/collectors/simple_list_collector.py”, line 627, in get_inference_input_dict collector = self.agent_collectors[k] KeyError: (1460398824, ‘agent0’)

Reproduction: The server is the “simple “CartPole-v0” server” provided in the Rllib “External Agents and Applications” section. Client scripts are provided below.

Suggested fix: Some debugging shows that self.forward_pass_agent_keys in simple_list_collector::get_inference_input_dict() is not updated when an episode is ended. I tried to add a method to update the variable in simple_list_collector.py, and call it in sampler:: _process_observations(). It seems working. But I’m sure the ray team will come up with a more proper solution.

Versions / Dependencies

python: 3.8.12 ray: 1.9.2

Reproduction script

#!/usr/bin/env python
"""
Example to produce a bug related to episode handling

"""

import argparse
import gym

from ray.rllib.env.policy_client import PolicyClient

parser = argparse.ArgumentParser()
parser.add_argument(
    "--no-train", action="store_true", help="Whether to disable training."
)
parser.add_argument(
    "--inference-mode", type=str, default="local", choices=["local", "remote"]
)
parser.add_argument(
    "--off-policy",
    action="store_true",
    help="Whether to compute random actions instead of on-policy "
    "(Policy-computed) ones.",
)
parser.add_argument(
    "--stop-reward",
    type=float,
    default=9999,
    help="Stop once the specified reward is reached.",
)
parser.add_argument(
    "--port", type=int, default=9900, help="The port to use (on localhost)."
)

if __name__ == "__main__":
    
    args = parser.parse_args()

    env = gym.make("CartPole-v0")
    client = PolicyClient(
        f"http://localhost:{args.port}", inference_mode=args.inference_mode
    )

    # Get a dummy obs
    dummy_obs = env.reset()
    dummy_action = 0
    dummy_reward = 0

    # Start an episode to only get actions
    action_eid = client.start_episode(training_enabled=False)
    # Get some actions using the action episode
    _ = client.get_action(action_eid, dummy_obs)
    _ = client.get_action(action_eid, dummy_obs)

    # Start a log episode to log action and log return for learning
    log_eid = client.start_episode(training_enabled=True)
    # Log action and log return
    client.log_action(log_eid, dummy_obs, dummy_action)
    client.log_returns(log_eid, dummy_reward)
    # End the log episode
    client.end_episode(log_eid, dummy_obs)

    # Continue getting actions using the action episode
    # The bug happens when executing the following line
    _ = client.get_action(action_eid, dummy_obs)

Anything else

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

sven1977commented, Feb 3, 2022

Should be merged soon.

0reactions

araleboxcommented, Feb 3, 2022

Should be merged soon.

Great! And thanks for the quick efforts! Shall I update my ray to the latest version after the merge?

Top Results From Across the Web

Getting Started with RLlib — Ray 2.2.0 - the Ray documentation

We first create a PPOConfig and add properties to it, like the environment we want to use, or the resources we want to...

[rllib] The Monitor wrapper records training episodes instead ...

The doc suggests that using record_env saves videos from the evaluation. However, it seems that all the training episodes are saved and none...

How to prevent my reward sum received during evaluation ...

The sum of reward received by all N agents is summed over these episodes and that is set as the reward sum for...

RLlib trainer common config - Every little gist

"num_envs_per_worker": 1, # Divide episodes into fragments of this many steps each during rollouts. # Sample batches of this size are collected from...

An Introduction to Reinforcement Learning with OpenAI Gym ...

How to Train an Agent by using the Python Library RLlib ... well-defined episodes, and done being True indicates the episode has terminated....