[rllib] Periodic policy evaluation in the course of training
See original GitHub issueProblem description: Suppose we can instantiate several environment simulators with predefined dynamics (source, or train tasks) and an instance of environment with slightly modified dynamics (target, or test task);
What we aim is to run policy optimization on episodes from source domain and periodically check agent performance progress on sample from target environment, such as:
repeat:
1. run policy optimization for ~10 episodes from source environment instances;
2. run ~1 episode of policy evaluation on episode from target environment;
It is desirable to run evaluation task as separate worker to prevent knowledge leakages (as opposed to ‘just-set-trainer-learning-rate-to-zero’ approach).
It is also highly desirable to run whole experiment from TUNE
python API and log run results under ‘evaluate’ tag to tensorboard summaries.
Question: Is there any predefined solution for setting such a workflow? If no, is there a suggested way to implement this?
My search through docs only returned checkpoint/load/evaluate routine from command line API: https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:17 (10 by maintainers)
Top GitHub Comments
@alversafa , you should add two key args (“evaluation_interval” and “evaluation_num_episodes”) to env_config to to enable periodic evaluation; you also can add another key “evaluation_config” containing dictionary of top-level env. key args/ Those args override basic (train) env. keys when instantiating evaluators. Simple example of such setup can be found here:
https://github.com/ray-project/ray/blob/master/rllib/tests/test_evaluators.py#L30
@Kismuz,
I made it by creating a deterministic version of the
Categorical
class in: https://github.com/ray-project/ray/blob/3d9bd64591506c2d3cd79d18c96908c996b52c3f/rllib/models/tf/tf_action_dist.py#L41where I take the argmax (instead of
tf.multinomial(.)
) in the following line:https://github.com/ray-project/ray/blob/3d9bd64591506c2d3cd79d18c96908c996b52c3f/rllib/models/tf/tf_action_dist.py#L78
Thanks for all the help.