Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] Periodic policy evaluation in the course of training

See original GitHub issue

Problem description: Suppose we can instantiate several environment simulators with predefined dynamics (source, or train tasks) and an instance of environment with slightly modified dynamics (target, or test task);

What we aim is to run policy optimization on episodes from source domain and periodically check agent performance progress on sample from target environment, such as:

repeat:
   1. run policy optimization for ~10 episodes from source environment instances;
   2. run ~1 episode of policy evaluation on episode from target environment;

It is desirable to run evaluation task as separate worker to prevent knowledge leakages (as opposed to ‘just-set-trainer-learning-rate-to-zero’ approach). It is also highly desirable to run whole experiment from TUNE python API and log run results under ‘evaluate’ tag to tensorboard summaries.

Question: Is there any predefined solution for setting such a workflow? If no, is there a suggested way to implement this?

My search through docs only returned checkpoint/load/evaluate routine from command line API: https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies

[Possibly] related: #2799, #4569 and #4496

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:17 (10 by maintainers)

Top GitHub Comments

3reactions

Kismuzcommented, Dec 30, 2019

@alversafa , you should add two key args (“evaluation_interval” and “evaluation_num_episodes”) to env_config to to enable periodic evaluation; you also can add another key “evaluation_config” containing dictionary of top-level env. key args/ Those args override basic (train) env. keys when instantiating evaluators. Simple example of such setup can be found here:

https://github.com/ray-project/ray/blob/master/rllib/tests/test_evaluators.py#L30

2reactions

alversafacommented, Feb 15, 2020

@Kismuz,

I made it by creating a deterministic version of the Categorical class in: https://github.com/ray-project/ray/blob/3d9bd64591506c2d3cd79d18c96908c996b52c3f/rllib/models/tf/tf_action_dist.py#L41

where I take the argmax (instead of tf.multinomial(.)) in the following line:

https://github.com/ray-project/ray/blob/3d9bd64591506c2d3cd79d18c96908c996b52c3f/rllib/models/tf/tf_action_dist.py#L78

Thanks for all the help.

Top Results From Across the Web

Getting Started with RLlib — Ray 2.2.0 - the Ray documentation

You can evaluate the trained algorithm with the following command (assuming ... that can be used to visualize training process with TensorBoard by...

How To Customize Policies — Ray 2.2.0

In this example, we'll dive into how PPO is defined within RLlib and how you can modify it. First, check out the PPO...

RLlib Training APIs — Ray 0.8.4 documentation

At a high level, RLlib provides an Trainer class which holds a policy for environment ... In order to save checkpoints from which...

Recommended way to evaluate training results - RLlib - Ray

Evaluating Trained Policies: which uses checkpoints and rollout to do evaluation for a particular number of timesteps. Customized Evaluation ...

RLlib Concepts and Custom Algorithms - the Ray documentation

Policy classes encapsulate the core numerical components of RL algorithms. ... are defined over batches of trajectory data produced by policy evaluation.