question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Thoughts on ideal training environment to save best agent during training when using multiple env's?

See original GitHub issue

Question

I’ve made use of the following snippet:

    num_cpu = 6 # Number of processes to use
    # Create the vectorized environment
    env = WilKin_Stock_Trading_Environment(df, lookback_window_size=lookback_window_size)
    env = SubprocVecEnv([make_env(env, i) for i in range(num_cpu)])

    model = A2C('MlpPolicy', env, verbose=1, gamma=0.91)

With this, it’s my understanding that there are 6 agents being trained at once, split up among the total_timestamps defined during training, which results in 6 different test results.

How does one combine these results into 1 agent for testing? Is there a way to pick the best agent?

Or, how could one make use of a callback so that the best “agent” gets checkpointed as training goes along?

Was thinking of using this snippet for the latter:

checkpoint_callback = CheckpointCallback(save_freq=1000, save_path='./logs/')
eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/best_model',
                             log_path='./logs/results', eval_freq=500)
callback = CallbackList([checkpoint_callback, eval_callback])

Sort of a newbie with SB3. Thanks!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
araffincommented, Jan 2, 2022

See this example on how to use callbacks to save the best model.

well, the EvalCallback is the recommended way to go (the default in the RL Zoo), the example in the doc is just to demonstrate the use of callback.

2reactions
Miffylicommented, Jan 2, 2022

No, there is only one agent being trained using six copies of the environment. This can speed up training (faster stepping of environment) but also stabilizes the training of A2C/PPO because you have a larger number of samples per update.

See this example on how to use callbacks to save the best model.


The following is an automated answer:

as you seem to try to apply RL to stock trading, i also must warn you about it. Here is recommendation from a former professional trader:

Retail trading, retail trading with ML, and retail trading with RL are bad ideas for almost everyone to get involved with.

  • I was a quant trader at a major hedge fund for several years. I am now retired.
  • On average, traders lose money. On average, retail traders especially lose money. An excellent approximation of trading, and especially of retail trading, is ‘gambling’.
  • There is a lot more bad advice on trading out there than good advice. It is extraordinarily difficult to demonstrate that any particular advice is some of the rare good advice.
  • As such, it’s reasonable to treat all commentary on retail trading as an epsilon away from snake oil salesmanship. Sometimes that’ll be wrong, but it’s a strong rule of thumb.
  • I feel a sense of responsibility to the less world-wise members of this community - which includes plenty of highschoolers - and so I find myself unable to let a conversation about retail trading occur without interceding and warning that it’s very likely snake oil.
  • I find repeatedly making these warnings and the subsequent fights to be exhausting.
Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the advantage of using more than one environment ...
Collecting experience is often a major bottleneck in training RL agents. Being able to do so in parallel in a distributed environment can ......
Read more >
Reinforcement Learning 101 - Towards Data Science
Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error...
Read more >
Best practices for call center agent training programs
3. Improve customer and employee experiences. Employees want proper training, the tools to do their jobs, rewards for good work and patience ...
Read more >
Reinforcement Learning with Stable Baselines 3 (P.3)
How to incorporate custom environments with stable baselines 3Text-based tutorial and sample code: ...
Read more >
VecNormalize for multiple training environments? · Issue #1114
Given I want to save and load a model training on 3 different environments each with its own dataset..... Which option makes more...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found