question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] Periodic policy evaluation in the course of training

See original GitHub issue

Problem description: Suppose we can instantiate several environment simulators with predefined dynamics (source, or train tasks) and an instance of environment with slightly modified dynamics (target, or test task);

What we aim is to run policy optimization on episodes from source domain and periodically check agent performance progress on sample from target environment, such as:

repeat:
   1. run policy optimization for ~10 episodes from source environment instances;
   2. run ~1 episode of policy evaluation on episode from target environment;

It is desirable to run evaluation task as separate worker to prevent knowledge leakages (as opposed to ‘just-set-trainer-learning-rate-to-zero’ approach). It is also highly desirable to run whole experiment from TUNE python API and log run results under ‘evaluate’ tag to tensorboard summaries.

Question: Is there any predefined solution for setting such a workflow? If no, is there a suggested way to implement this?

My search through docs only returned checkpoint/load/evaluate routine from command line API: https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies

[Possibly] related: #2799, #4569 and #4496

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:4
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
Kismuzcommented, Dec 30, 2019

@alversafa , you should add two key args (“evaluation_interval” and “evaluation_num_episodes”) to env_config to to enable periodic evaluation; you also can add another key “evaluation_config” containing dictionary of top-level env. key args/ Those args override basic (train) env. keys when instantiating evaluators. Simple example of such setup can be found here:

https://github.com/ray-project/ray/blob/master/rllib/tests/test_evaluators.py#L30

2reactions
alversafacommented, Feb 15, 2020

@Kismuz,

I made it by creating a deterministic version of the Categorical class in: https://github.com/ray-project/ray/blob/3d9bd64591506c2d3cd79d18c96908c996b52c3f/rllib/models/tf/tf_action_dist.py#L41

where I take the argmax (instead of tf.multinomial(.)) in the following line:

https://github.com/ray-project/ray/blob/3d9bd64591506c2d3cd79d18c96908c996b52c3f/rllib/models/tf/tf_action_dist.py#L78

Thanks for all the help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting Started with RLlib — Ray 2.2.0 - the Ray documentation
You can evaluate the trained algorithm with the following command (assuming ... that can be used to visualize training process with TensorBoard by...
Read more >
How To Customize Policies — Ray 2.2.0
In this example, we'll dive into how PPO is defined within RLlib and how you can modify it. First, check out the PPO...
Read more >
RLlib Training APIs — Ray 0.8.4 documentation
At a high level, RLlib provides an Trainer class which holds a policy for environment ... In order to save checkpoints from which...
Read more >
Recommended way to evaluate training results - RLlib - Ray
Evaluating Trained Policies: which uses checkpoints and rollout to do evaluation for a particular number of timesteps. Customized Evaluation ...
Read more >
RLlib Concepts and Custom Algorithms - the Ray documentation
Policy classes encapsulate the core numerical components of RL algorithms. ... are defined over batches of trajectory data produced by policy evaluation.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found