Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] What is the proper way to log metrics at the end of each epoch when epochs are variable in length?

See original GitHub issue

Problem description I am training a PPO model for stock trading using a custom gym environment, called StockTradingEnv. Each “epoch” of training is variable in length, since the epoch ends under two conditions: 1) the agent loses all of its initial money, or 2) the agent reaches the end of the data frame/time series (and has not lost all of its money). I would like to log the net change in the agent’s balance at the end of each of these epochs. To do so, I maintain an array within the environment, StockTradingEnv.list_networth, containing the agent’s net worth at each time step, and reset it (i.e. empty the array) at the start of each new epoch. I attempted to create a subclass of BaseCallback, called TensorboardCallback, with a very simple _on_step() method – it checks StockTradingEnv.done, and if True, logs the net_change for that epoch (the difference between the values at the last and first indexes of StockTradingEnv.list_networth). However, it appears that PPO is only invoking its callbacks every n_steps and n_steps=1 is not permitted as per the documentation:

:param n_steps: The number of steps to run for each environment per update (i.e. rollout buffer size is n_steps * n_envs where n_envs is number of environment copies running in parallel) NOTE: n_steps * n_envs must be greater than 1 (because of the advantage normalization)

Even with n_steps=2, it is possible that an epoch ends on, say, step 1001 (not divisible by 2) and thus no net_change will be logged for that epoch.

What is the proper solution using stable-baselines3 to log metrics from the environment systematically at the end of each epoch, when the epoch lengths are not a constant number of steps?

Code For the sake of brevity, I did not include the code for the custom environment here. I can always add this if someone deems it necessary.

import pandas as pd
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.callbacks import BaseCallback

from env import StockTradingEnv  # a custom gym environment for stock trading

# A custom callback 
class TensorboardCallback(BaseCallback):
    """ Logs the net change in cash between the beginning and end of each epoch/run. """

    def __init__(self, verbose=0):
        super(TensorboardCallback, self).__init__(verbose)
        self.env = self.training_env.envs[0]

    def _on_step(self) -> bool:
        if self.env.done:
            net_change = self.env.list_networth[-1] - self.env.list_networth[0]
            self.logger.record("net_change", net_change)

        return True


# Load training data
WMT_Train = pd.read_csv("WMT_Train.csv")

# Instantiate the custom environment
env = DummyVecEnv([lambda: StockTradingEnv(WMT_Train, start=0, end=10000, look_back=10)])

# Instantiate model
model = PPO('MlpPolicy', env, learning_rate=0.0001, verbose=0, ent_coef=0.5, 
            tensorboard_log="./ppo_log", n_steps=128)

# Fit model using the custom callback
model.learn(total_timesteps=500000, tb_log_name="PPO_log", callback=TensorboardCallback())

System Info

Python version: 3.9.7
Stable-baselines3 (version 1.2.0), installed via pip: pip install ‘stable-baselines3[extra]’
Tensorflow version: 2.6.0
Gym version: 0.20.0

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

Miffylicommented, Sep 23, 2021

Is there a robust way to trigger callbacks at the end of each epoch when the epochs do not have a known length?

Hmm I am bit confused about the concept of epoch here. It sounds like what you mean is an episode (from reset to done=True in an environment)? If that is the case, a simple Monitor wrapper (see examples on how to add this) would do the trick which saves data on each individual episode into a csv file you can then load up later. At least, this is what I understood by your description (sorry for not suggesting this earlier, I was under the impression you might have tried this).

1reaction

Miffylicommented, Sep 23, 2021

You should probably open this issue on stable-baselines3 repository 😃.

But to answer your question: if I understand correctly you want to log stats after each time PPO is updated. In that case you should use _on_rollout_start and _on_rollout_end (former is called when new samples are collected, latter when the sampling is done. Training also happens once per rollout start/end).

Top Results From Across the Web

Difference Between a Batch and an Epoch in a Neural Network

The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model's internal ......

What's the problem of evaluating a global metric at the end of ...

The answer is that there isn't one -- evaluating some metric at the end of an epoch, after all updates are applied, gives...

Logging — PyTorch Lightning 1.8.5.post0 documentation

Setting both on_step=True and on_epoch=True will create two keys per metric you log with suffix _step and _epoch respectively. You can refer to...

how can I find the number of epochs for which keras model ...

loss after every batch and after every epoch end. If you need only the epoch you could use just the method on_epoch_end and...

TensorBoard Scalars: Logging training metrics in Keras

Logging metrics at the batch level instantaneously can show us the level of fluctuation between batches while training in each epoch, which can ......