[rllib] What is the proper way to restore checkpoint for fine-tuning / rendering / evaluation of a trained agent based on example/multiagent_cartpole.py?
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- Ray installed from (source or binary): pip install ray
- Ray version: 0.6.5
- Python version: 3.6.2
- Exact command to reproduce:
Describe the problem
Before my question, let me introduce my understanding of the checkpoint file system. (you can skip it and toward my question)
The codes in example/multiagent_cartpole.py
produces a experiment_state-2019-04-03_00-47-28.json
-like file and a directory PPO_experiment_name
with a few .pkl, .json, .csv files in it.
The file system looks like:
- local_dir (say: "~/ray_results")
- exp_name (say: "PPO")
- checkpoints (say: experiment_state-2019-04-05_17-59-00.json)
- directory (named like: PPO_cartpole_0_2019-04-05_18-28-0296h2tknq)
- xxx.log
- params.json
- params.pkl (This is the file to store trained parameter, I guess?)
- progress.csv
- result.json
After one successful training, now we have a trained agent (Because I used one shared policy for all agent). We set the local_dir
exactly the same as training. Then set the exp_name
exactly as training too, namely PPO
.
Now it’s my problem. The tune.run
function take two arguments which looks like helpful for restoring.
“resume” argument
The resume argument, once set to True, will automatically search in local_dir/exp_name/
finding the most recent experiment_state-<date_time>.json
.
The resume
work well. After setting it to true, the restoring seems to be successful, but the program immediately terminated, as if it inherit the termination states from the checkpoint.
Here’s the log:
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/1 GPUs
Memory usage on this node: 4.3/16.7 GB
Result logdir: /home/SENSETIME/pengzhenghao/ray_results/PPO
Number of trials: 1 ({'TERMINATED': 1})
TERMINATED trials:
- PPO_tollgate_0: TERMINATED, [12 CPUs, 1 GPUs], [pid=9214], 4846 s, 300 iter, 1320000 ts, 1.1e+03 rew
The printed reward is exactly what trained agent able to give, but I cannot continue to train this agent, even if I set the num_iters
greater than the number of iterations in last training (namely 300).
What’s more, it seems impossible using the resume
argument to specify a checkpoint given the exact filename.
In a nut shell, my question on the resume
argument is:
- What’s the meaning for this argument? It seems like it’s only used for restore checkpoint from unexpected failures. Therefore, it cannot be used to restore specified checkpoint. Am I correct?
“restore” argument
After setting restore=<log_dir>
, namely restore="./experiments"
, which is my log_dir
, it turn out to be an error:
Traceback (most recent call last):
File "xxx/anaconda3/envs/dev/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 499, in restore
ray.get(trial.runner.restore.remote(value))
File "xxx/anaconda3/envs/dev/lib/python3.6/site-packages/ray/worker.py", line 2316, in get
raise value
ray.exceptions.RayTaskError: ray_PPOAgent:restore() (pid=28099, host=g114e1900387)
File "xxx/anaconda3/envs/dev/lib/python3.6/site-packages/ray/tune/trainable.py", line 304, in restore
with open(checkpoint_path + ".tune_metadata", "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './experiments.tune_metadata'
I have checked everywhere of this computer and there is no such a file ended with .tune_metadata
. I am really confusing.
In short, what I am trying to do is:
-
Restore the trained agent and continue it’s training with the same config.
-
Restore the trained agent, retrieve the Policy network, and used in the same environment with rendering, in order to visualize it’s performance.
-
Restore the trained agent as a pre-trained agent and modify the config, such as using more workers and GPU to training on cluster.
Could you please tell me what I should do?
(By the way, the document is really insufficient for thoroughly understanding the whole process of rllib. Nevertheless I still appreciate your guys for this excellent project, wish some day I can make some contribution too~)
Issue Analytics
- State:
- Created 4 years ago
- Reactions:23
- Comments:23 (15 by maintainers)
Top GitHub Comments
For the potential reader:
resume
argument do nothing but continue the last unfinished task. In this mode, it’s no allowed to reset thenum_iters
.restore
argument take the path of the checkpoint file as input. Concretely, the file look like~/ray_results/expname/envname_date_someothercodes/checkpoint_10/checkpoint-10
. Note that the checkpoint files would only exist for thosetune.run()
executions withcheckpoint_at_end=True
orcheckpoint_freq
setting to non-zero value.Using
restore
argument and taking the checkpoint from which you want to continue the experiment is the only way to enlarge the number of iterations of a finished or unfinished experiment.Thank Eric for offering quick and kind responses!
Since I was searching for a simple way to load a trained agent and continue training with RLlib, and I only found this issue, here’s what I found & what’s the easiest way in my opinion:
Ie, just set the path in the
restore
argument, that’s it! No need for a custom train function.