[Bug] Rllib episode handling with concurrent episodes
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Ray Component
RLlib
What happened + What you expected to happen
In my project, I use ExternalEnv to train a policy. Due to practical reasons, I need to create an ‘action episode’ to continuously get actions. Meanwhile, separate episodes (‘log episodes’) are started to log actions and returns, then terminated. An error happens after a ‘log episode’ is ended, and get_action API is called using the ‘action episode’.
Error message: — Logging error — Traceback (most recent call last): File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/env/policy_client.py”, line 273, in run samples = self.rollout_worker.sample() File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 757, in sample batches = [self.input_reader.next()] File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 103, in next batches = [self.get_data()] File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 265, in get_data item = next(self._env_runner) File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 656, in _env_runner eval_results = _do_policy_eval( File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py”, line 1068, in _do_policy_eval input_dict = sample_collector.get_inference_input_dict(policy_id) File “/Users/ewayuwa/anaconda3/envs/rai_ray192/lib/python3.8/site-packages/ray/rllib/evaluation/collectors/simple_list_collector.py”, line 627, in get_inference_input_dict collector = self.agent_collectors[k] KeyError: (1460398824, ‘agent0’)
Reproduction: The server is the “simple “CartPole-v0” server” provided in the Rllib “External Agents and Applications” section. Client scripts are provided below.
Suggested fix: Some debugging shows that self.forward_pass_agent_keys in simple_list_collector::get_inference_input_dict() is not updated when an episode is ended. I tried to add a method to update the variable in simple_list_collector.py, and call it in sampler:: _process_observations(). It seems working. But I’m sure the ray team will come up with a more proper solution.
Versions / Dependencies
python: 3.8.12 ray: 1.9.2
Reproduction script
#!/usr/bin/env python
"""
Example to produce a bug related to episode handling
"""
import argparse
import gym
from ray.rllib.env.policy_client import PolicyClient
parser = argparse.ArgumentParser()
parser.add_argument(
"--no-train", action="store_true", help="Whether to disable training."
)
parser.add_argument(
"--inference-mode", type=str, default="local", choices=["local", "remote"]
)
parser.add_argument(
"--off-policy",
action="store_true",
help="Whether to compute random actions instead of on-policy "
"(Policy-computed) ones.",
)
parser.add_argument(
"--stop-reward",
type=float,
default=9999,
help="Stop once the specified reward is reached.",
)
parser.add_argument(
"--port", type=int, default=9900, help="The port to use (on localhost)."
)
if __name__ == "__main__":
args = parser.parse_args()
env = gym.make("CartPole-v0")
client = PolicyClient(
f"http://localhost:{args.port}", inference_mode=args.inference_mode
)
# Get a dummy obs
dummy_obs = env.reset()
dummy_action = 0
dummy_reward = 0
# Start an episode to only get actions
action_eid = client.start_episode(training_enabled=False)
# Get some actions using the action episode
_ = client.get_action(action_eid, dummy_obs)
_ = client.get_action(action_eid, dummy_obs)
# Start a log episode to log action and log return for learning
log_eid = client.start_episode(training_enabled=True)
# Log action and log return
client.log_action(log_eid, dummy_obs, dummy_action)
client.log_returns(log_eid, dummy_reward)
# End the log episode
client.end_episode(log_eid, dummy_obs)
# Continue getting actions using the action episode
# The bug happens when executing the following line
_ = client.get_action(action_eid, dummy_obs)
Anything else
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Should be merged soon.
Great! And thanks for the quick efforts! Shall I update my ray to the latest version after the merge?