[Question] Thoughts on ideal training environment to save best agent during training when using multiple env's?
See original GitHub issueQuestion
I’ve made use of the following snippet:
num_cpu = 6 # Number of processes to use
# Create the vectorized environment
env = WilKin_Stock_Trading_Environment(df, lookback_window_size=lookback_window_size)
env = SubprocVecEnv([make_env(env, i) for i in range(num_cpu)])
model = A2C('MlpPolicy', env, verbose=1, gamma=0.91)
With this, it’s my understanding that there are 6 agents being trained at once, split up among the total_timestamps
defined during training, which results in 6 different test results.
How does one combine these results into 1 agent for testing? Is there a way to pick the best agent?
Or, how could one make use of a callback so that the best “agent” gets checkpointed as training goes along?
Was thinking of using this snippet for the latter:
checkpoint_callback = CheckpointCallback(save_freq=1000, save_path='./logs/')
eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/best_model',
log_path='./logs/results', eval_freq=500)
callback = CallbackList([checkpoint_callback, eval_callback])
Sort of a newbie with SB3. Thanks!
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
What is the advantage of using more than one environment ...
Collecting experience is often a major bottleneck in training RL agents. Being able to do so in parallel in a distributed environment can ......
Read more >Reinforcement Learning 101 - Towards Data Science
Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error...
Read more >Best practices for call center agent training programs
3. Improve customer and employee experiences. Employees want proper training, the tools to do their jobs, rewards for good work and patience ...
Read more >Reinforcement Learning with Stable Baselines 3 (P.3)
How to incorporate custom environments with stable baselines 3Text-based tutorial and sample code: ...
Read more >VecNormalize for multiple training environments? · Issue #1114
Given I want to save and load a model training on 3 different environments each with its own dataset..... Which option makes more...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
well, the
EvalCallback
is the recommended way to go (the default in the RL Zoo), the example in the doc is just to demonstrate the use of callback.No, there is only one agent being trained using six copies of the environment. This can speed up training (faster stepping of environment) but also stabilizes the training of A2C/PPO because you have a larger number of samples per update.
See this example on how to use callbacks to save the best model.
The following is an automated answer:
as you seem to try to apply RL to stock trading, i also must warn you about it. Here is recommendation from a former professional trader: